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Abstract 



We introduce the s-Plex Cluster Vertex Deletion problem. Like the Cluster 
Vertex Deletion problem, it is NP-hard and motivated by graph-based data clustering. 
While the task in Cluster Vertex Deletion is to delete vertices from a graph so 
that its connected components become cliques, the task in s-Plex Cluster Vertex 
Deletion is to delete vertices from a graph so that its connected components become 
s-plexes. An s-plex is a graph in which every vertex is nonadjacent to at most s — 1 other 
vertices; a clique is an 1-plex. In contrast to Cluster Vertex Deletion, s-Plex 
Cluster Vertex Deletion allows to balance the number of vertex deletions against 
the sizes and the density of the resulting clusters, which are s-plexes instead of cliques. 

The focus of this work is the development of provably efficient and effective data 
reduction rules for s-Plex Cluster Vertex Deletion. In terms of fixed-parameter 
algorithmics, these yield a so-called problem kernel. A similar problem, s-Plex Editing, 
where the task is the insertion or the deletion of edges so that the connected components of 
a graph become s-plexes, has also been studied in terms of fixed-parameter algorithmics. 
Using the number of allowed graph modifications as parameter, we expect typical 
parameter values for s-Plex Cluster Vertex Deletion to be significantly lower 
than for s-Plex Editing, because one vertex deletion can lead to a high number of 
edge deletions. This holds out the prospect for faster fixed-parameter algorithms for 
s-Plex Cluster Vertex Deletion. 
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1 Introduction 



Data clustering problems are of great importance in the disciplines of machine learning, 
pattern recognition, and data mining [3]. Given a data set, one can define a measure 
of similarity on data pairs. The goal in data clustering is to partition the data set into 
clusters so that the elements within a cluster are similar, while there are less similarities 
between vertices in different clusters. Mapping clustering tasks into graph-theoretic 
models allows the usage of the broad variety of graph algorithms to process and cluster 
data [23]. Usually, the similarity between data records is mapped to a graph G as follows: 
each vertex in G corresponds to a data record, and an edge between two vertices in G 
exists if and only if the similarity of the corresponding data records exceeds a certain 
threshold. This threshold is specific to the actual clustering problem. An obvious possible 
postulation on clusters is for each data pair in one cluster to be similar. A cluster can 
therefore be interpreted as a complete graph, also called clique. Subject to our goal that 
there shall be only few similarities between vertices in different clusters, the graph G 
constructed from our data would ideally consist of isolated cliques only. Such a graph is 
called a cluster graph. For real-world data, it is unrealistic to expect G to be a cluster 
graph. We could modify G to become a cluster graph, but because we want to avoid 
excessive perturbation of the input data, the graph should be modified only modestly. 
One way to model this task is Cluster Vertex Deletion [14]. 

Cluster Vertex Deletion 

Instance: An undirected graph G = (V, E) and a natural number k. 
Question: Is there a vertex set Scy with \S\ < k such that deleting all vertices 
in S from G results in a graph where each connected component forms a clique? 

This problem corresponds to discarding at most k data records in order to find a plausible 
data clustering. We can regard the discarded data records as outliers. Although Cluster 
Vertex Deletion is a very intuitive model of graph-based data clustering, it is very 
restrictive as it requires every data pair in a cluster to be similar. Cluster Vertex 
Deletion offers no option to relax this requirement, so that we could allow for a few 
dissimilarities within the resulting clusters. Obviously, it is desirable to balance the 
amount of discarded data against the number of dissimilarities within a cluster. Also, 
inaccuracies in the data could render finding satisfactory clustering results using Cluster 
Vertex Deletion impossible, yielding too many or too small clusters. Therefore, we 
weaken the requirement for every connected component to form a clique. Seidman and 
Foster [24] have introduced one generalization of the clique concept in 1978: 

Definition 1.1. For s > 1, an s-plex is a graph G = (V, E) such that every vertex in V 
is adjacent to at least \V\ — s other vertices in V. 
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For example, a clique is an 1-plex. By modeling clusters using s-plexes instead of cliques, 
we allow each data record to be dissimilar to s — 1 other data records within the same 
cluster. Although the s-plex concept has already been introduced in 1978, it has only 
recently become subject to algorithmic research [2, 11, 18, 20, 26]. In this work, we 
introduce the s-Plex Cluster Vertex Deletion problem. 

s-Plex Cluster Vertex Deletion 

Instance: An undirected graph G = (V, E) and a natural number k. 
Question: Is there a vertex set Scy with \S\ < k such that deleting all vertices 
in S from G results in a graph where each connected component forms an s-plex? 

In the following, we will call a graph that has only s-plexes as connected components 
an s-plex cluster graph. For each s, the s-Plex Cluster Vertex Deletion problem 
yields a different clustering model. In each model, s determines the "density" of the 
resulting clusters and with that the dissimilarities that are allowed within each cluster. 

Fixed-Parameter Algorithmics. In this work, we study the s-Plex Cluster Ver- 
tex Deletion problem in terms of fixed-parameter algorithmics. Fixed-parameter 
algorithmics aims at a multivariate complexity analysis of problems without giving 
up the demand for finding optimal solutions [6, 8, 21]. A parameterized problem is a 
language L C S* x N, where £ is a finite alphabet. The second component is called the 
parameter of the problem. The s-Plex Cluster Vertex Deletion problem is a pa- 
rameterized problem with the input G and the parameter k. A parameterized problem L 
is fixed-parameter tractable if it can be determined in f(k)\x\°^ time whether (x, k) G L, 
where / is a computable function only depending on k. The corresponding complexity 
class is called FPT. 

Given a parameterized problem instance (x,k), reduction to a problem kernel or 
kernelization means to transform (x, k) into an instance (x\ k') in polynomial time, such 
that the size of x' is bounded from above by some function only depending on k' < k, 
and (x, k) is a yes-instance if and only if (x', k') is a yes-instance. We refer to (x', k') as 
problem kernel. Kernelization enables us to develop provably efficient and effective data 
reduction rules. Refer to Guo and Niedermeier [13] for a survey on problem kernelization. 
In this work, we present a kernelization for s-Plex Cluster Vertex Deletion. 

Terminology. We only consider undirected graphs G = (V, E), where V is the set of 
vertices and E is the set of edges. Throughout this work, we use n := \V\ and m := \E\. 
We call two vertices v, w G V adjacent or neighbors if {v, w} G E. The neighborhood N(v) 
of a vertex v G V is the set of vertices that are adjacent to v. For a vertex set U C V, 
we set N(U) := [j v( zuN(v) \ U. We call a vertex v G V adjacent to V C V if v has 
a neighbor in V. Analogously, we extend this definition and call a vertex set U C. V 
adjacent to a vertex set W C V with W n U = if N(U) H W ^ 0. A path in G from vi 
to V£ is a sequence (v\, v 2 , . . . , v^) G V e of vertices with {v iy v i+ \} G E for i G {1, . . . ,£— 1}. 
We call two vertices v and w connected in G if there exists a path from v to w in G. 
For a set of vertices V C V, the induced subgraph G[V] is the graph over the vertex 
set V with the edge set {{v,w} G E \ v,w G V'}. For V C V, we use G — V as an 
abbreviation for (j[V \ V']. 
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Figure 1.1: Minimal forbidden induced subgraphs for s = 2. 

Related Work. The two "sister problems" of s-Plex Cluster Vertex Deletion, 
namely s-Plex Editing and Cluster Vertex Deletion, have been subject to recent 
research [11, 14]. The goal of the s-Plex Editing problem is to transform a graph 
into an s-plex cluster graph by insertion or removal of at most k edges. For Cluster 
Vertex Deletion, Hiiffner et al. [14] have developed fixed-parameter algorithms using 
the recent iterative compression [12] technique introduced by Reed et al. [22]. Their 
algorithm solves Cluster Vertex Deletion in O (2 h ■ n 2 (m + n log n)) time, where k is 
the number of allowed vertex deletions. Guo et al. [11] have shown a problem kernel with 
0(ks 2 ) vertices for s-Plex Editing, where k is the number of allowed edge modifications. 
They also have developed the following forbidden induced subgraph characterization for 
s-plex cluster graphs. 

Theorem 1.1 (Guo et al. [11]). Let G = {V,E) be a graph. Let F be the set of all 

connected graphs with at most \ V\ vertices that contain a vertex that is nonadjacent to s 
other vertices. The graph G is an s-plex cluster graph if and only if it does not contain 
any graph from F as induced subgraph. 

Guo et al. [11] have also shown the stronger result that, for each natural number s, there 
exists a natural number d G 0(s + a/s) such that if a graph G is not an s-plex cluster 
graph, then G contains a forbidden induced subgraph (Fisg) with at most d vertices. 
They present an algorithm that, if G is not an s-plex cluster graph, finds such a FlSG 
in G in 0(s(n + m)) time. If s = 2 and if G is not a 2-plex cluster graph, then their 
algorithm always finds one of the three FlSGs shown in Figure 1.1. We can solve s-Plex 
Cluster Vertex Deletion by repeatedly finding a Fisg with at most d vertices in 
0(s(n + m)) time and then branching into all possibilities of deleting one of its vertices. 
This yields a trivial search tree algorithm to solve s-Plex Cluster Vertex Deletion 
in 0(d k s(n + m)) time. Algorithms with a lower exponential time term can be obtained 
employing the g?-Hitting Set problem: 

(i-HiTTiNG Set 

Instance: A set H, a collection of subsets C C {H' C H \ \H'\ < d} and a natural 
number k. 

Question: Is there a hitting set S C H with \S\ < k such that each set in C contains 
an element of SI 

We obtain a g?-Hitting Set instance (H,C,k) from an s-Plex Cluster Vertex 
Deletion instance (G, k) as follows: we use the vertex set of G as H; for each FlSG F 
containing at most d vertices from G, we add the vertex set of F to C. Because each 
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element in C corresponds to a FlSG with at most d vertices, we have \C\ G 0(n d ). 
Because this bound is exponential in d, it is practically infeasible to transform an s-Plex 
Cluster Vertex Deletion instance into a c/-Hitting Set instance without prior 
data reduction. We can solve c?-Hitting Set using a trivial 0(d k |C|)-time search tree 
algorithm; we repeatedly choose a set from the collection C and branch into all possibilities 
of adding one of its vertices to a hitting set. Faster algorithms for d- Hitting Set are 
known [21]. For example, consider the special case s = 2. The FlSGs for 2-Plex 
Cluster Vertex Deletion are shown in Figure 1.1. The trivial search tree algorithm 
for 2-Plex Cluster Vertex Deletion (as discussed above) runs in 0(4 fc (n + m)) 
time. We can solve an equivalent 4-Hitting Set instance in O(3.076 fc + \C\) time by 
combining Wahlstrom's O(2.076 fc + |C|)-time algorithm for 3-Hitting Set [25] with 
iterative compression, as discussed by Dom et al. [5]. 

The forbidden induced subgraph characterization by Guo et al. [11] implies that every 
induced subgraph of an s-plex cluster graph is again an s-plex cluster graph. The 
property of being an s-plex cluster graph is thus hereditary. Lewis and Yannakakis [16] 
have shown that vertex deletion problems for hereditary graph properties are NP-hard. 
Because it can be verified in polynomial time whether a graph contains a FlSG for s-plex 
cluster graphs, s-Plex Cluster Vertex Deletion is in NP. As a consequence, we 
can conclude that s-Plex Cluster Vertex Deletion is NP-complete. Further, Lund 
and Yannakakis [17] have shown that vertex deletion problems for hereditary graph 
properties are constant-factor approximable and MAX SNP-hard, if the graph property 
admits a characterization by a finite number of FlSGs. Because s-plex cluster graphs 
are characterized by a finite number of FlSGs, finding a minimum solution for s-Plex 
Cluster Vertex Deletion is constant-factor approximable and MAX SNP-hard. 

Our contributions. We show a problem kernel with 0(k 2 ) vertices for 2-Plex Clus- 
ter Vertex Deletion, which can be found in 0{kn 2 ) time. We then generalize 
this kernelization algorithm to show a problem kernel with 0(k 2 s 3 ) vertices for s-Plex 
Cluster Vertex Deletion, which can be found in 0{ksn 2 ) time. 
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2 Kernelization for 2-Plex Cluster 
Vertex Deletion 



In this chapter, we transform a 2-Plex Cluster Vertex Deletion instance (G, k) 
into a problem kernel (G', k'). To this end, we present a series of data reduction rules 
that remove vertices from G so that the maximum number of vertices in the resulting 
graph G' depends only on the parameter k. These data reduction rules also compute the 
new parameter k! < k. For each data reduction rule, we show that it can be carried out 
in polynomial time and that it is correct, that is, we show that (G, k) is a yes-instance if 
and only if (G', k') is a yes-instance. 

Assume that we are given a 2-Plex Cluster Vertex Deletion instance (G,k). 
We want to apply a series of data reduction rules to G so that we can bound the size of G 
by a function only depending on the parameter k. To structure the graph G, we first 
search for a constant-factor approximate solution X so that each connected component 
in G — X is a 2-plex. This partitions the graph as shown in Figure 2.1. To bound the 
overall size of G by a function only depending on the parameter k, we independently 
bound the sizes of G — X and X by functions only depending on k. 



Figure 2.1: Constant-factor approximate solution X and the graph G — X. 

To bound the size of X, we use that X is a constant-factor approximate solution. If (G, k) 
is a yes-instance, then G can be transformed into a 2-plex cluster graph by at most k 
vertex deletions. This implies that the size of X is at most ck for some constant factor c. 
In particular, the maximum size of X only depends on k. If X contains more than ck 
vertices, we stop our kernelization algorithm and output that (G, k) is a no-instance. 

It is left to bound the size of G — X by a function only depending on the parameter k. 
To this end, we present data reduction rules to independently bound the number and the 
sizes of the connected components in G — X by functions only depending on k. Bounding 
the sizes of the connected components is the most sophisticated part of our kernelization 
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algorithm. To this end, we employ graph separators and introduce a generalization of 
the graph module concept [9, 19] in Section 2.3. 

Summarizing, we obtain a problem kernel for a 2-Plex Cluster Vertex Deletion 
instance (G, k) by executing the following steps: 

1. Find a constant-factor approximate solution X such that G — X is a 2-plex cluster 
graph. This is the subject of Section 2.1. Because X is a constant- factor approx- 
imate solution, the size of X is bounded by a function only depending on the 
parameter k. 

2. Bound the number of connected components in G — X by a function only depending 
on the parameter k. To this end, we use data reduction rules presented in Section 2.2. 

3. Bound the sizes of the connected component in G — X by a function only depending 
on the parameter k. To this end, we use data reduction rules presented in Section 2.3. 

In Section 2.4, we show that the remaining graph (consisting of the vertices in X 
and the connected components in G — X to which all data reduction rules have been 
applied) contains 0(k 2 ) vertices. Together with the new parameter computed by our 
data reduction rules, this graph constitutes our problem kernel. 

In the following, we write solution for a vertex set X such that G — X is a 2-plex 
cluster graph. If we intend to refer to a solution containing at most k vertices, then we 
state it explicitly. 

2.1 An Approximate Solution 

In this section, we present an algorithm that greedily computes an approximate solution 
for 2-Plex Cluster Vertex Deletion. Given a graph G, Guo et al. [11] have shown 
that if G is not an s-plex cluster graph, an 0(s + -y/s)-vertex FlSG in G can be found in 
0(s(n + m)) time. For the case s — 2, this algorithm finds the FlSGs shown in Figure 1.1. 
We apply their algorithm for s = 2 to construct an initial solution: 

Algorithm 2.1. Given a graph G, we start with H = G and X = 0. We repeatedly 
apply the algorithm by Guo et al. [11] to find a FlSG in if, we add its vertices to X, and 
remove them from H. If no FlSG can be found, then the algorithm stops and returns X. 

Figure 2.1 illustrates the separation of G into X and H = G — X . 

Lemma 2.1. Algorithm 2.1 computes a factor- 4 approximate solution for 2-Plex Clus- 
ter Vertex Deletion. It can be earned out m 0(n(n + m)) time. 

Proof. First, we show the running time. In each step, a FlSG can be found in 0(n + m) 
time. Because in each step of Algorithm 2.1 four vertices are removed from H, we apply 
it at most 0(n) times. Therefore, Algorithm 2.1 runs in 0(n(n + m)) time. 

It is left to show that the set X computed by Algorithm 2.1 is a factor-4 approximate 
solution. Algorithm 2.1 stops when no more FlSGs can be found in H = G — X. Thus, 
H must be a 2-plex cluster graph and X is a solution. 
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Let F be the set of all FlSGs found by Algorithm 2.1. Because each FlSG is deleted 
from H when it is discovered, the graphs in F are pairwise vertex-disjoint. Any solution 
must contain at least one vertex of each FlSG in F. Therefore, the size of a solution is at 
least \F\. Each FlSG found by the algorithm of Guo et al. [11] contains four vertices. It 
follows that the solution X computed by Algorithm 2.1 contains 4|F| vertices, which is 
at most four times the number of vertices in an optimal solution. □ 

Corollary 2.1. Let (G,k) be a yes-instance. Then, Algorithm 2.1 computes a solution 
for G that contains at most 4k vertices. 

Many of the following observations and data reduction rules require an initial solution X. 
In those observations, we make no assumptions about X other than X being a solution. 
For practical considerations, a heuristic search for an initial solution might be superior to 
employing Algorithm 2.1. Heuristic search might not only be faster, but might also find a 
smaller solution. This is desirable because the size of our problem kernel is proportional 
to the size of the initial solution. However, to conclude a problem kernel with 0(k 2 ) 
vertices, we require an initial constant-factor approximate solution. 

2.2 Bounding the Number of Connected 
Components 

Let X be a solution for G. In this section we bound the number of connected components 
in G — X by a function only depending on the parameter k. To this end, we employ a 
data reduction rule that resembles Buss and Goldsmith's [4] kernelization of the Vertex 
Cover problem. 

Lemma 2.2. Let (G,k) be a 2-Plex Cluster Vertex Deletion instance and let 
F(v) be a set of FlSGs pairwisely intersecting only in the vertex v of G . If\F{y)\ > k, 
then (G, k) is a yes-instance if and only if {G — {v}, k — 1) is a yes-instance. 

Proof. If (G, k) is a yes-instance, then there exists a solution S with \S\ < k such that 
G — S is a 2-plex cluster graph. The set S \ {v} is a solution for G — {v}. If S does not 
contain v, then it contains at least one vertex for every FlSG in F(v). Because there 
are more than k FlSGs in F(v), this contradicts \S\ < k. Therefore, v G S and S \ {v} 
contains at most k — 1 vertices. This shows that (G — {v}, k — 1) is a yes-instance. 

If (G — {v}, k — 1) is a yes-instance, then G — {v} admits a solution S of size k — 1. 
The set S U {v} is a solution for G that contains at most k vertices. Thus, (G, k) is a 
yes-instance. □ 

In Section 2.2.1, we introduce the concept of peripheral sets. Given a solution X, 
peripheral sets help us in Section 2.2.2 to bound the number of connected components 
in G — X and help us in Section 2.3 to bound their sizes. We present an algorithm that 
constructs a peripheral set efficiently and enables us to give a lower bound on the number 
of vertices that pairwisely intersect only in a single vertex o6l. If more than k FlSGs 
intersect only in v, then we can remove v from G according to Lemma 2.2. 
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2.2.1 Peripheral Sets 

In this section, we present an algorithm that, for each vertex v in a solution X, constructs 
a vertex set M(v) that allows us to give a lower bound on the number of FlSGs that 
pairwisely intersect only in the vertex v. If this lower bound shows that more than k FlSGs 
pairwisely intersect only in v, then we can remove v from G according to Lemma 2.2. 

As a side effect, we construct the sets M(v) so that their union M ■= \J veX M(v) 
helps us to bound the number and the sizes of the connected components in G — X: 
informally speaking, if we remove M from G, then we want each vertex v G X to be 
adjacent to only one large connected component in G — (X U M). As a result, there will 
be at most \X\ large connected components in G — (X U M) adjacent to X. Further, if 
a vertex v G X has a neighbor in a connected component in G — (X U M), then we want 
the vertex v to be adjacent to almost all of that connected component's vertices. This 
will help us in Section 2.3 to bound the sizes of the connected components in G — X. We 
later formalize these properties and capture them under the concept of a peripheral set. 

We will see that we can easily bound the size of M by a function only depending on the 
parameter k. Thus, the graph G — M can be thought of as the "core" of our kernelization 
problem, for which we must provide further data reduction rules. In contrast, the vertices 
in M are only of peripheral interest. 

Given a solution X for G, we now construct the set M{y) for each vertex v G X. We 
start with M(v) = 0. Then, we repeatedly search for a FlSG F in G that contains v but 
no vertices from M{y) and add the vertices of F — {v} to M(v). This ensures that we 
only find FlSGs that pairwisely intersect only in v. To find such FlSGs, we present three 
observations on the connected components in G — X . Each observation will lead to a 
phase of an algorithm that constructs the sets M(v). 

Definition 2.1. Let V be the vertex set of G and let X be a solution. We define 
the collection 7i(X) '■— {H QV\H induces a connected component in G — X} of the 
vertex sets of the connected components in G — X . 

Because each set in T~C{X) induces a connected component in G — X and because X is a 
solution, each set in TC(X) induces a 2-plex. 

We now turn to our first out of three observations. Let v G X be a vertex with 
three neighbors u, w, and t. Assume that u is nonadjacent to w and t, as shown in 
Figure 2.2(a). Then, F : = G[{t,u,v,w}] is a connected graph, but v is nonadjacent to 
two vertices t and w. According to Theorem 1.1, F is a FlSG. 

Algorithm 2.2 (Phase 1). Given a graph G and a solution X, initialize M(v) := for 
each v & X. For each v G X, as long as there are vertices t,u,w G N(v) \ (M(v) U X) 
such that u is neither adjacent to t nor w, add the vertices t, u, and w to M(v). 

Now, for each vertex v in the solution X, let M(v) be the set constructed by Phase 1 of 
Algorithm 2.2. For a vertex v G X, assume that there exists a set H G T~i{X) such that v 
is adjacent to a vertex u G H \ M{y) but nonadjacent to two vertices t,w G H\ M(v). 
This situation is shown in Figure 2.2(b). The graph G[{t,u,w}] is an induced subgraph 
of G[H]. Thus, it is a 2-plex with three vertices, implying that it is connected. Because v is 
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(a) FiSGs that will be found in Phase 1. 



(b) FiSGs that will be found in Phase 2. 




v 



)X 




(c) A Fisg that will be found in Phase 3. 



(d) FiSGs that will not be found. 



Figure 2.2: Each figure shows the graph G with a solution X and FiSGs that are found 
in the different phases of Algorithm 2.2. Also compare these FiSGs with the FiSGs 
shown in Figure 1.1. The vertices u,v,w, and t as used in the algorithm are shown. 
The big circles represent connected components in G — X, that is, they are 2-plexes 
and their vertex sets are sets in H(X). Squares are vertices in the set M(v) for some 
vertex v G X, that is, they are vertices of FiSGs that have already been found. 

adjacent to u, the vertex v is connected but nonadjacent to the two vertices t and w. By 
Theorem 1.1, G[{t,u,v,w}} is a FlSG. We continue Algorithm 2.2 as follows: 

Algorithm 2.2 (Phase 2). For each v G X, as long as there is a set H G 7~C(X) such that 

1. the vertex v is adjacent to a vertex u G H \M(v) and 

2. the vertex v is nonadjacent to two vertices t,w G H \ M(v), 
add the vertices t, u, and w to M(v). 

Now, for each vertex v in a solution X, let M(v) be the set constructed by Phase 1 
and Phase 2 of Algorithm 2.2. Assume that for a vertex v G X, there exist two 
sets U,W G T~C(X) such that there exist two neighbors u G U \ M{v) and w G W \ M(v) 
of v. This situation is shown in Figure 2.2(c). Assume that U \ M(v) or W \ M(v) 
contains at least three vertices. Without loss of generality, assume that \U \ M(v)\ > 3. 
Then, G[U\ M(v)] is a connected 2-plex. Therefore, there exists a neighbor t G U\ M(v) 
of u. The vertex w is nonadjacent to t and u, because w is in another set in TC(X). 
Because F := G[{t,u,v,w}] is connected, F is a FlSG according to Theorem 1.1. 
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Figure 2.3: An example for a peripheral set M, which contains the vertices drawn as 
squares. Shown is the graph G with a solution X. The circles represent sets in TC(X), 
which induce connected components in G — X. 

Algorithm 2.2 (Phase 3). For each vertex v £ X, as long as there are two vertex 
sets U, W £ H{X) such that 

1. the vertex v has neighbors u £ U \ M(v) and w £ W \ M(v) and 

2. there is a neighbor t ^ X U M(v) of either u or io, 

add the vertices t, u, and w to M(v). Finally, return M(v) for all vertices v £ X. 

This concludes the description of Algorithm 2.2. For a solution X, we now inspect the 
union M : = (J^eX M(v) of the sets M(t>) constructed by Algorithm 2.2. Informally 
speaking, we show that if we remove M from G, then each vertex v £ X is adjacent to 
the vertices of at most one large connected component in G — [X U M). As a result, 
there are at most \X\ large connected components in G — (X U M) containing neighbors 
of X. Further, we show that if v is adjacent to vertices of a connected component 
in G — (X U M), then it is adjacent to almost all of its vertices. This helps us in 
Section 2.3 to bound the sizes of the connected components in G — X. To formalize these 
properties, we introduce the concept of a peripheral set: 

Definition 2.2. Let X be a solution. We call a vertex set M with the following properties 
peripheral with respect to X: 

1. For each vertex v £ X, there are at most two sets H £ 7i(X) such that H \ M is 
adjacent to v . 

2. If there is a vertex v £ X and a set H £ TC(X) such that H \ M is adjacent to v, 
then v is nonadjacent to at most one vertex in H \ M. 

3. For each vertex v £ X, if there is more than one set H £ TC(X) such that H\M 
is adjacent to v, then each such set H satisfies \H\M\ < 2. 

For an example, refer to Figure 2.3. In this figure, no vertex in X is adjacent to the 
three sets T \ M, U \ M, and W \ M, as required by Definition 2.2(1). The vertex u is 
adjacent to T \ M and U \ M. There is only one vertex in T \ M that is nonadjacent 
to u, as required by Definition 2.2(2). As required by Definition 2.2(3), the sets T\M 
and U \ M each contain at most two vertices. The vertex w is only adjacent to W \ M. 
Because W\M contains more than two vertices, w is only adjacent to W\M, as required 
by Definition 2.2(3). 
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Lemma 2.3. Let X be a solution. Let M := [J v( - X M(v) be the set constructed by 
Algorithm 2.2. The set M is peripheral with respect to X . 

Proof. We do not directly prove that for each vertex v G X, the set M satisfies the 
properties in Definition 2.2. Instead, we show for each vertex v G X that the set M(v) 
satisfies them. Because M{y) C M for all v G X, this is sufficient. We show the 
properties separately. 

(1) Assume that there exists a vertex v G X and three sets T,U,W G Ti.(X) such 
that v has the neighbors t G T \ M(v),u G U \ M(v), and w G W \ M{v). This case is 
illustrated for the vertex v' in Figure 2.2(a). Because the vertices t,u, and w come from 
different connected components in G — X, they are pairwise nonadjacent. Phase 1 of 
Algorithm 2.2 would have added t, u, and w to M(v ). This contradicts the assumption 
that t G T \ M(t>), u G U \ M(t>), and w G T \ M(t>). This shows the first property. 

(2) Assume that there exists a set G 'H{X) such that the vertex t) 6 X is adja- 
cent to the vertex u G H \ M(v) and v is nonadjacent to the vertices t G H \ M(v) 
and u> G H\M(v). This is illustrated in Figure 2.2(b). Phase 2 of Algorithm 2.2 
would have added the vertices t,u, and w to M(t>). This contradicts the assumption 
that t,u,w G H \ M{v). This shows the second property. 

(3) Assume that there exist two sets U,W G T~C{X) such that a vertex u G X has 
the neighbors u £ U \ M{v) and w G W \ M{v). Without loss of generality, assume 
that \U \ M{v)\ > 2. This situation is shown in Figure 2.2(c). Because \U \ M(v)\ > 2, 
the 2-plex G[U\M(v)} is connected. Therefore, the vertex u has a neighbor t G U\M(v). 
Phase 3 of Algorithm 2.2 would have added t, u, and w to M(v). This contradicts the 
assumption that u G U \ M{v) and w G W \ M{y). To fully prove the third property, 
one can show \W \ M(v)\ < 2 analogously. □ 

In the following, we provide a more detailed view on the execution steps of Algorithm 2.2 
and also analyze its running time. The following lemma enables us to execute Phase 3 of 
Algorithm 2.2 quickly. 

Lemma 2.4. Let X be a solution. For each vertex v G X , let M(v) be the set constructed 
by Phase 1 of Algorithm 2.2. If there exists a vertex v G X and two sets U,W G H(A) 
such that U \ M(v) and W \ M(v) are adjacent to v, then \N(v) \ (M(v) U X)\ = 2. 

Proof. Assume that the vertex v G X has three neighbors t,u,w ^ M(v) U X, as 
shown in Figure 2.2(a). According to the proof of Lemma 2.3, there are at most two 
sets U, W G H(X) such that v is adjacent to U \ M(v) and W \ M(v). Without loss of 
generality, assume that t,w & W \ M{v) and u G U \ M{v). The vertices t, u, and w are 
neighbors of v and u is neither adjacent to t nor w. Phase 1 of Algorithm 2.2 would have 
added t, u, and w to M(v ). This contradicts the assumption that t,u,w ^ M(v ) UX. □ 

Lemma 2.5. Given a solution X, Algorithm 2.2 can be carried out in 0(|X|n 2 ) time. 

Proof. Given a graph G and a solution X, we first compute the graph G — X in 
0(n + m) time. We can then compute the collection H(A) of vertex sets of the connected 
components in G — X. This can be done in 0(n + m) time using breadth-first search. 
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During the construction of 7i(A), we construct a table T that stores, for each vertex u, 
the set H G H(A) with u E H. We assume that set membership can be tested in constant 
time and that elements can be added to sets in constant time. For each vertex v E A, 
we now execute the three phases: 

In Phase 1, we construct the set N(v) \ (M(v) U X) in 0(n) time. For each vertex 
u E N(v) \ (M(v) U X), we scan the set N(v) \ (M(t>) U X) again to find two vertices 
nonadjacent to u. Therefore, Phase 1 runs in 0(n 2 ) time for each vertex v E X. 

In Phase 2, for each u G N(v), we can (using the table T) find H G 7~C{X) with u G H 
in constant time. If u G X or u G M(v), then we proceed with the next u G N(v). 
Otherwise, in 0(n) time, we scan H \ M(v) for two vertices that are nonadjacent to v. 
The running time for one vertex u G N(v) is thus 0(n), resulting in a running time 
of 0{n 2 ) for each v G X. 

In Phase 3, we first construct the set N(v) \ (M(v) U X) in 0(n) time. According to 
Lemma 2.4, if we have \N(v) \ (M(v) U X) \ ^ 2, then there is at most one set H G 'H{X) 
such that H \ M(v) is adjacent to v. Thus, we continue with the next v G X . Otherwise, 
let u,w G N(v) \ (M(v) U X). In constant time, we check (using the table T) if the 
vertices u and w are in different sets in 7i(X). If so, we scan the neighborhoods of u 
and w for a vertex t ^ X U M(v) in O(n) time. Thus, the total running time of Phase 3 
is 0(n) for each v. Algorithm 2.2 has a worst-case running time of 0(\X\n 2 ). □ 

Note that, given a vertex v of a solution X, Algorithm 2.2 only finds a FlSG F containing v 
if the vertices in F — {v } are neighbors of v or if at least two vertices of F — {v } are 
in distinct connected components in G — X. This is not the case for the FlSGs shown 
in Figure 2.2(d). Thus, Algorithm 2.2 does not necessarily find them. We could search 
for these FlSGs, but this would presumably increase the asymptotic running time of 
Algorithm 2.2. It would not improve the worst-case size of our problem kernel. 

2.2.2 Reducing the Number of Connected Components 

In this section, given a solution X for the graph G, we present data reduction rules to 
bound the number of connected components in G — X by a function only depending on 
the parameter k. To this end, we bound the size of the peripheral set constructed by 
Algorithm 2.2 using the following data reduction rule, which is based on Lemma 2.2. 

Reduction Rule 2.1. Let A be a solution. For each vertex v G A, let M(v ) be the set 

constructed by Algorithm 2.2. If there exist a vertex v G A such that |M(t>)| > 3k, then 
delete v from G and A and decrement k by one. 

Lemma 2.6. Reduction Rule 2.1 is correct. Given a solution X and the set M(v) 
constructed by Algorithm 2.2 for each vertex v G X, we can exhaustively apply Reduction 
Rule 2.1 in 0(\X\n + m) time. 

Proof. If Algorithm 2.2 adds vertices to M(v ) for a vertex v G X, then it has found a 
FlSG that contains no vertices from M(v). That is, apart from v, this FlSG does not 
contain vertices from previously found FlSGs. Thus, if |M(t>)| > 3k, then M(v) contains 
vertices of more than k FlSGs that pairwisely intersect only in the vertex v. According 
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to Lemma 2.2, we can delete v from G and decrement the parameter k by one. For each 
vertex v £ X , the elements in M[v) can be counted in 0(n) time. The deletion of all 
vertices v £ X with |M(f)| > 3k is possible in 0(n + m) time. □ 

Observe that for each vertex v in a solution X, Reduction Rule 2.1 does not change 
the set M(v) constructed by Algorithm 2.2. Also, the graph G — X is invariant under 
Reduction Rule 2.1; so is the set 7i(X). We can conclude that, after we have applied 
Reduction Rule 2.1 to G and X, the proof of Lemma 2.3 is still valid and shows that the 
set [J veX M(v) is still peripheral by Definition 2.2. Therefore, Reduction Rule 2.1 does 
not only reduce the size of G and X; we also obtain a smaller peripheral set. This is 
because after the exhaustive application of Reduction Rule 2.1, for each vertex v £ X , 
the set M{y) contains at most 3k vertices. 

Corollary 2.2. Let X be a solution for G. For each vertex v £ X , let M(v) be the 
set constructed by Algorithm 2.2. After exhaustively applying Reduction Rule 2.1 to G 
and X , the peripheral set M := \J veX M(v) contains at most 3k\X\ vertices. 

Now that we have bounded the size of the peripheral set, we can, given a solution X, 
bound the number of connected components in G — X. First, we remove connected 
components from G — X, which are induced by the vertex sets in TC(X), according to 
the following data reduction rule. Then, we use a peripheral set to show a bound on the 
number of the remaining connected components. 

Reduction Rule 2.2. Let X be a solution. If there exists a set H £ Ti(X) that is 
nonadjacent to X, then remove the vertices in H from G. 

Lemma 2.7. Reduction Rule 2.2 is correct. Given a solution X , we can exhaustively 
apply Reduction Rule 2.2 in 0(n + to) time. 

Proof. Let H £ TC(X) be the set of vertices chosen for removal by Reduction Rule 2.2 and 
let G 1 := G — H. To prove the correctness of Reduction Rule 2.2, we have to show that 
{G 1 ', k) is a yes-instance if and only if (G, k) is a yes-instance. If (G, k) is a yes-instance, 
then there exists a solution S with \S\ < k for G. Since G — S is a 2-plex cluster graph, 
G' — S is a 2-plex cluster graph as well. Thus, (G', k) is a yes-instance. 

If (G', k) is a yes- instance, then there exists a solution S with \S\ < k for G' . Because 
Reduction Rule 2.2 chooses to remove the vertices in H from G, the set H is nonadjacent 
to the solution X. Therefore, H induces an isolated 2-plex in G. It can therefore not 
contain vertices of a FlSG. Thus, also G — S is a 2-plex cluster graph and (G, k) is a 
yes-instance. 

Considering the running time, we can obtain the set Ti-(X) in 0(n + to) time. During 
the construction of H(X), we use a table T to store for each vertex u the set H £ T~t(X) 
with u £ H. We have already used this technique in the proof of Lemma 2.5. We construct 
a further table T' as follows: for each vertex v £ X and for each vertex u £ N(v) \ X, 
we set T'[T[w]] = 1. This can be done in 0{n + m) time. Then, the sets H £ T~t(X) 
with T'[H] = are known to have no neighbor in X. These can be removed from G 
in 0(n + rn) time. □ 
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Figure 2.4: A solution X and a vertex set M. The big circles represent sets in TC(X), or 
connected components in G — X, likewise. 

Given a solution X and a vertex set M, there are two possible scenarios for a connected 
component in G — X. Consider the vertex set if £ 7~C{X) of such a connected component. 
As shown in Figure 2.4(a), it might be the case that the edges between the set H D M and 
the solution X separate the vertices in H from the vertices in X. That is, the set H \ M 
might be nonadjacent to X. As shown in Figure 2.4(b), it might also be the case that for 
a set H £ TC(X), the set H \ M is adjacent to X. According to Definition 2.2(1), if M 
is peripheral, then there are at most 2\X\ sets H £ H{X) such that H \ M is adjacent 
to X. To bound the total number of connected components in G — X, it is left to bound 
the number of sets H £ T~t{X) such that H \ M is nonadjacent to A. 

Lemma 2.8. Let X be a solution and let M be a vertex set. After applying Reduction 
Rule 2.2, there are at most \M\ sets H £ 7~L{X) such that H\M is nonadjacent to X. 

Proof. Let H £ TC (X) such that H \ M is nonadjacent to the solution X. Because 
Reduction Rule 2.2 has been applied, the set H must be adjacent to X. Otherwise, 
Reduction Rule 2.2 would have removed H. Because the set H \ M is nonadjacent to X, 
the set H must contain a vertex from M that is adjacent to X. Because a vertex in M 
can be contained in only one set in TC(X), there can be at most \M\ sets H £ TC(X) such 
that H \ M is nonadjacent to X. □ 

Given a solution X and a peripheral set M, we conclude from Definition 2.2(1) and 
Lemma 2.8 that the number of the connected components in G — X is at most 2\X\ + \M\. 

2.3 Bounding the Sizes of Connected Components 

In this section, given a solution X for G, we bound the sizes of the connected components 
in G — X by functions only depending on the parameter k. Because we have already 
bounded the size of X and the number of connected components in G — X, this will 
finally lead to a problem kernel, as we have discussed in the beginning of Chapter 2. In 
Section 2.3.1, we present a generalization of the module concept [9, 19]. Based on this, we 
develop a data reduction rule to reduce the sizes of the connected components in G — X. 
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(a) Graph prior to reduction. (b) Removed w and x: valid (c) Removed u: wrong 

data reduction. data reduction. 

Figure 2.5: In each displayed graph, the vertices drawn as squares form an X-module. 

Section 2.3.2 deals with the efficient execution of this data reduction rule and uses a 
peripheral set M to bound the sizes of the connected components. In Section 2.3.3, we 
present an additional data reduction rule that is only applicable to connected components 
induced by sets H G TC(X) such that H \ M is nonadjacent to X. We have already 
specially handled this type of connected components in Section 2.2.2, where we bounded 
the number of connected components in G — X. We use the fact that the edges between 
the set H fl M and the solution X separate the vertices in H from the vertices in X, as 
shown in Figure 2.4(a). We will see that the additional data reduction rule presented in 
Section 2.3.3 is necessary to obtain an 0(/c 2 )-vertex problem kernel. 

2.3.1 Data Reduction Based on Modules 

Given a solution X, we now develop a characterization of vertices that can be removed 
from the connected components in G — X. This characterization is based on so-called 
modules [9, 19]. For a graph with the vertex set V, a vertex subset Z C V is called a 
module, if any two vertices u,v G Z satisfy N(v) \ Z = N(u) \ Z. That is, a vertex not 
in Z is adjacent to either to all or to no vertices in Z. For example, the two vertices w 
and x in Figure 2.5(a) form a module. Modules also serve as the base of the critical 
clique concept introduced by Guo [10] to kernelize the Cluster Editing problem. 

Given a vertex set W C V, we generalize the module concept and introduce the 
W-module. We call a vertex set Z C V a W-module, if any two vertices u,v G Z 
satisfy N(u) fl W = N(v) fl W. That is, a vertex in W is either adjacent to all or to no 
vertices in Z. Figure 2.5 shows examples for ^-modules. Observe that if Z C V is a 
(V\Z) -module, then Z is a module. Every subset of a PF-module is again a PF-module. 

For a graph G and a solution X, we use the fact that the vertices in an X-module are 
equivalent with respect to their neighborhood in X. The idea is, informally, to represent 
a large X-module by one of its subsets and to replace the X-module by its representative. 
Consider the following example, which also shows that we cannot choose an arbitrary 
subset of an X-module as representative: the graph shown in Figure 2.5(a), call it G', 
requires one vertex deletion to transform it into a 2-plex cluster graph. The vertices u, w, 
and x are part of an X-module. Observe that also for G' — {w, x} shown in Figure 2.5(b), 
one vertex deletion is required to transform it into a 2-plex cluster graph. It follows 
that (G', k) is a yes-instance if and only if (G' — {w,x}, k) is. Therefore, it is valid to 
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remove w and x from G' to obtain the graph shown in Figure 2.5(b). In contrast, the 
graph G' — {u} shown in Figure 2.5(c) is a 2-plex cluster graph. Because G' — {u} can 
be transformed into a 2-plex cluster graph with less vertex deletions than G', we may 
not remove u from G' . To circumvent this problem, we give a constraint on the vertices 
that may be removed from an X-module in G. Recall that the connected components 
in G — X are induced by vertex sets in H(X). 

Definition 2.3. Let X be a solution. For H G H{X), let R(H) C H be an X-module. 
We call R{H) redundant if there exists an X-module Z(H) with R(H) C Z(H) C H 
that contains all vertices from H that are nonadjacent to a vertex in R(H). 

Reduction Rule 2.3. Let X be a solution, let H G T~C(X) and let R(H) be a redundant 
subset of H. If \R(H)\ > k + 3, then choose an arbitrary vertex from R{H) and remove 
it from G. 

In Section 2.3.2 we construct a redundant set R(H) for each vertex set H G 7~t(X) so 
that we can give a bound on the size of H \ R(H). Using Reduction Rule 2.3, we can 
then bound the size of R(H). To prove the correctness of the above data reduction 
rule, we assume that Reduction Rule 2.3 chooses to remove a vertex u from G and show 
that (G, k) is a yes- instance if and only if (G — {«}, k) is a yes- instance. To this end, we 
need three further observations, which we present in the following lemmas. 

Lemma 2.9. Let Q be an arbitrary graph and let v be a vertex of Q . If Q — {v} but 
not Q is a 2-plex cluster graph, then Q contains a FlSG including the vertex v. 

Proof. Because Q is not a 2-plex cluster graph, it contains a FlSG. If all FlSGs in Q did not 
contain v, then no FlSG could be destroyed by removing v from Q. Thus, Q — {v} would 
not be a 2-plex cluster graph, contradicting our assumption. □ 

Additionally to the assumption that Reduction Rule 2.3 chooses to remove a vertex u 
from G, we now assume that (G — {«}, k) is a yes-instance and show two further lemmas. 
Finally, we prove the correctness of Reduction Rule 2.3. 

Assumption 2.1. Let X be a solution and let R{H) be a redundant subset of H G H(X). 
Assume that Reduction Rule 2.3 chooses to remove the vertex u G R{H) from G. Further, 
assume that (G — {u},k) is a yes-instance, that is, that there exists a solution S 
with \S\ < k for the graph G — {u}. 

In the following, we write G' for G — {u}. Because we assume that Reduction Rule 2.3 
chooses to remove u from R(H), the set R{H) must contain more than k + 3 vertices, 
which implies \R(H) \(SU {u})\ > 3. Because G[H] is a 2-plex, G[R(H) \(SU {«})] is 
a 2-plex containing at least three vertices. We can conclude that G[R(H) \ (S U {«})] is 
connected. The graph G[H \ (S U {it})] is connected for the same reason. 

Lemma 2.10. Under Assumption 2.1, let G — S contain a FlSG F including u. Then 
in G' — S, the vertices of F — {u} are connected to all vertices in H \(S U {u}). 
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(b) Case w € X. Because R(H) \ S is an 
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Figure 2.6: The vertices it, v, and w are named as in the proof of Lemma 2.10. Note that 
in either case, v is connected to all vertices in H \ (S U {it}) even if u is removed. Also 
note that the vertex v is not necessarily in X. 

Proof. Let v be a vertex of F — {u}. Because F is connected, there exists a path in G — 5, 
connecting i> to u. This path has to use a neighbor w of u (possibly, v = w). We now 
distinguish between the two cases w G H \ S and w £ H \ S. 

According to Assumption 2.1, G[H \ (S U {u})} is connected. So if w G if \ S, as 
shown in Figure 2.6(a), then w is connected to every other vertex in H \ (S U {it}). That 
is, w connects v to the vertices in H \ (S U {it}) even when u is removed. 

Because w is in G — S, we have w S. That is, if w ^ H \ S, then w ^ H. Because w 
is adjacent to u G R{H) and because there are no edges between distinct sets in 7i(X), 
we have w G X, as shown in Figure 2.6(b). Because u is the neighbor of w G X and u is 
in the AT-module R(H), it follows that all vertices in R(H) \ (SU {u}) are neighbors of w 
in G' — S. So w connects v to the vertices in H \ (S U {n}) even when u is removed. □ 

Lemma 2.11. Under Assumption 2. 1, let Z(H) be an X-module with R(H) C Z(H) C if 
and fet F be a FlSG in G — S including u. If a vertex v of F is nonadjacent to a 
vertex w G Z(H) \ S , then v G H \S . 

Proof. Assume that a vertex v ^ H \ S of F is nonadjacent to the vertex if G Z(H) \ S. 
This situation is shown in Figure 2.7. Because 1> is in G — S, we have v S and 
therefore v (ji H. We first show that v is nonadjacent to the X-module Z(H) \ S. 

Assume that v is adjacent to the X-module Z(H) \ S. This implies v G X, because 
there are no edges between distinct sets in TC(X) and v H . Because w is in the 
X-module Z(H) \ S and because v G X is adjacent to Z(H) \ S, the vertex v must also 
be adjacent to w. This is by our assumption not the case, so v is nonadjacent to the 
X-module Z(H) \ S. In particular, v is nonadjacent to its subset R(H) \ (S U {it}). 

According to Lemma 2.10, the vertex v is connected to all vertices in R{H) \ (S U {u}) 
in G' — S 1 . By Assumption 2.1, there are at least three vertices in R(H) \ (S U {u}). 
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Figure 2.7: Because v G X is nonadjacent to the vertex w of the X- module Z(H) \ S, the 
vertex v can not be adjacent to any vertex in Z(H) \ S. These are more than three 
vertices. But v is connected to all vertices in H \ S, including Z(H) \ S. 

These are connected but nonadjacent to v in G' — S. By Theorem 1.1, this implies that 
there exists a FlSG in G' — S, contradicting Assumption 2.1. □ 

Lemma 2.12. Reduction Rule 2.3 is correct. 

Proof. Assume that Reduction Rule 2.3 chooses to remove a vertex u from G. Let G' 
denote the graph G — {u}. We have to show that (G, k) is a yes-instance if and only 
if (G', k) is a yes-instance. If (G, k) is a yes-instance, then there exists a solution S 
with \S\ < k such that G — S is a 2-plex cluster graph. Then, also G' — S is a 2-plex 
cluster graph and {G' , k) is a yes-instance. 

If (G',k) is a yes-instance, then there exists a solution S with \S\ < k such that 
G' — S is a 2-plex cluster graph, implying that Assumption 2.1 is true. Assume that 
G — S contains a FlSG. By Lemma 2.9, there exists a FlSG F in G — S containing the 
vertex u. Because F is a FlSG, it contains a vertex v that is connected but nonadjacent 
to two vertices w, x in F. 

If u {v,w,x}, then Lemma 2.10 shows that the vertices v,w,x are connected to all 
vertices in H \ (S U {u}) in G' — S. Thus, the vertices v, w, and x would exist in G' — S 
and would be connected. That contradicts G' — S being a 2-plex cluster graph, because v 
is nonadjacent but connected to the vertices w and x. Thus, u must be one of v, w or x. 

First, assume that u = v. That is, the vertex u G R{H) is nonadjacent to the vertices w 
and x. From Lemma 2.11, we can conclude that w,x G H\S. Because also u G H \ S , 
this contradicts the graph G[H \ S] being a 2-plex. So u must either be w or x. 

Without loss of generality, assume that u — w. That is, the vertex u G R(H) is 
nonadjacent to v. By Lemma 2.11, we have v G H \ S . By Definition 2.3, there exists an 
X-module Z(H) with R(H) C C H and t> G because the vertex v G H\S 

is nonadjacent to the vertex u G R(H). But then, because the vertex t> G Z(H) is 
nonadjacent to x, the vertex x must also be in H \ S by Lemma 2.11. This again 
contradicts G[H \ S] being a 2-plex. We conclude that G — S must be a 2-plex cluster 
graph. Thus, (G', k) is a yes-instance. □ 
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2.3.2 Constructing Redundant Sets 

In this section, we show how to efficiently find redundant sets as defined in Definition 2.3. 
Our goal is, given a solution X and the vertex set H G H(X) of a connected component 
in G — X, to construct a redundant subset R{H) C H so that the size of H \ R(H) 
is bounded by a function only depending on the parameter k. Then, we can apply 
Reduction Rule 2.3 to R{H) to bound the overall size of H. 

To this end, we employ a peripheral set M. Using Corollary 2.2, we can bound the 
size of M by 3A;|X|. Thus, for each H G H(X), we only need to bound the size of 
the set H \ M. Definition 2.2(2) for peripheral sets guarantees that if a vertex t)£l 
is adjacent to H \ M, then there is at most one vertex in H \ M that is nonadjacent 
to the vertex v. Thus, the number of vertices in H \ M that are nonadjacent to a 
vertex in N(H \ M) H X cannot exceed |X|. The size of X is in turn bounded by 4k in 
Corollary 2.1. It follows that we only have to bound the number of vertices in H \ M that 
are adjacent to all vertices in N(H \ M) fl X. We show that we can obtain a redundant 
set from such vertices by employing the following algorithm: 

Algorithm 2.3. Given a set M that is peripheral with respect to a solution X, for 
each H G H(X), first find all vertices belonging to H n M and N(H \ M) fl X. Then, 
construct the sets 

A(H) := {u G H | 3w G H fl M : u is nonadjacent to w}, 

B(H) := {u e H \ 3w e N(H \ M) n X : u is nonadjacent to w}, and 

C(H) := {u G H | 3w G B(H) : u is nonadjacent to w}. 

Return R(H) := H \ R(H), where R(H) := A(H) U B{H) U C{H) U (H fl M). 

Lemma 2.13. Given a set M that is peripheral with respect to a solution X , for H G 
7i{X), let R{H) be the set constructed by Algorithm 2.3. The set R{H) is redundant. 

Proof. According to Definition 2.3, we have to show that there exists an X-module Z(H) 
with R{H) C Z(H) C H that contains all vertices in H that are nonadjacent to a vertex 
in R(H). Because G[H] is a 2-plex, we could choose Z(H) := R(H). But with s-plexes 
in mind, we present a proof that does not rely on the fact that G[H] is a 2-plex. 

Consider the set Z(H) := {u G H \ M | N(u) n X = N(H \ M) n X}. For any 
two vertices u, i> G we have that JV(tt) nl = iV(if \ M) fl X = N(v) n X. 

Thus, the set Z(H) C if is an X-module. To show that a vertex w is in Z(H), it is 
sufficient to show u £ H \ M and iV(if \ M) fl X C iV(w) fl X. The opposite inclusion 
X(fT \ M) n X D X(m) n X follows directly from u <E H\M. 

We first show that R(H) C Z(ii). Because ,R(ii) fl M = 0, every vertex in ^(ii) 
is in H \ M. Because R(H) n ^(if) = 0, for a vertex w G X(# \ M) fl X, each 
vertex w G R{H) is adjacent to w. Otherwise, u would be in B(H). From this, we can 
conclude that N(H \ M) fl X C N(u) n X. This implies u G ^(ii). 

Now assume that there exists a vertex u G R{H) and a vertex w E H such that w 
and iy are nonadjacent. From R{H) C\A(H) = follows that w ^ M. Otherwise, u would 
be in A(H). Because R(H) n C(fT) = 0, for a vertex v G X(if \ M) fl X, the vertex 
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w is adjacent to v. Otherwise, w G B(H) and therefore u G C(H). Thus, we have 
N(H \M) HI C N(w) HI and w G Z(H). □ 



Lemma 2.14. Given a set M that is peripheral with respect to a solution X, Algo- 
rithm 2.3 can be carried out in 0(n 2 ) time. 

Proof. Observe that we can construct the set TC(X) in 0(n + m) time. During the 
construction of 7i(X), we use a table T to store for each vertex u the set H G T~C(X) 
with u G H. We now scan each H G Tl(X) in four passes, classifying each vertex u G H 
as follows: 

The first pass constructs the sets H D M and iV(if \ M) nl. If u G M, we memorize 
the vertex u to belong to n M. If it ^ M, we memorize its neighbors in X to belong 
to N(H \ M) fl X. Finding w's neighbors in X can take 0(|X|) time. 

The second pass constructs the sets A(H) and B(H) with the results from the first pass 
as follows: if the vertex u is nonadjacent to a vertex in if fl M, then add u to A(H). This 
works in 0(\H fl M\) time. If the vertex u is nonadjacent to a vertex in N(H \ M) fl X, 
which can be checked in 0(|X|) time, then add u to B(H). 

The third pass is similar to the second pass and constructs C(H) from B(H) in 
0(\B(H)\) time. In a final pass, we add all vertices u that are not in A(H), B(H), C(H) 
or M to R(H). This can be done in constant time for each vertex u. 

Finally, we encounter at most n vertices scanning through each H G 7i(X), yielding a 
total running time of 0(n 2 ). □ 

Lemma 2.15. Given a set M that is peripheral with respect to a solution X , we can 
exhaustively apply Reduction Rule 2.3 in 0(n 2 ) time. 

Proof. We first, for all H G 7i(X), use Algorithm 2.3 on the sets X and M to construct 
the sets R(H) in 0(n 2 ) time (Lemma 2.14). According to Lemma 2.13, these sets are 
redundant. Thus, Reduction Rule 2.3 can be applied. 

Observe that after Reduction Rule 2.3 removes a vertex u G R{H) from G, the set 
R{H) \ {u} is still redundant. Thus, we can remove a whole subset of R{H) from G 
without constructing new redundant sets between vertex deletions. 

For each H G 7i(X), we can count the number of vertices in R{H) in 0(\R(H)\) time. 
Removing a set of vertices works in 0(n + m) time. □ 

Given an instance (G, k) and a solution X for G, we can now bound the sizes of the 
connected components in G — X by a function that only depends on the parameter k. 

Lemma 2.16. Let the set M be peripheral with respect to a solution X. For a set 
H G 7i(X), let R{H) be the redundant subset constructed by Algorithm 2.3. After 
exhaustively applying Reduction Rule 2.3 using R{H), the number of vertices in H\M 
is at most \HHM\ + 2\N(H \ M) n X| + k + 3. 

Proof. To prove the above lemma, we study the sets constructed in Algorithm 2.3. By 
construction of R(H), we have R{H) = H \ R(H). Observe that because R{H) C H, 
we also have H \ R(H) = R(H). Because G[H] is a 2-plex, there exists at most one 
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vertex u G H for every vertex w G H n M such that u and w are nonadjacent. Thus, we 
have |A(iif)| < \H D M|. Because M is peripheral, we can conclude from Definition 2.2(2) 
that for each vertex w G N(H \ M) n X, there is at most one vertex u E H \ M such 
that m and w are nonadjacent. If N(H \ M) fl X = 0, then B(H) = 0. Thus, we have 
< \N(H \ M) fl X|. Now, again because G[H] is a 2-plex, there exists at most 
one vertex u G H for every vertex w G B(H) such that w and w are nonadjacent. Thus, 
we have \C{H)\ < \B(H)\ < \N(H \M)C\X\. This shows that the number of vertices 
in H\(R(H)UM) cannot exceed \H (lM\+2\N(H\M)nX\. To get the total number of 
vertices in H \ M, we must add \R(H)\. Reduction Rule 2.3 bounds \R(H)\ to k + 3. □ 

2.3.3 Data Reduction Based on Separators 

In the previous section, we have, given a solution X, bounded the sizes of the connected 
components in G — X. Given a peripheral set M, we now present an additional data 
reduction rule to further reduce the sizes of the connected components induced by vertex 
sets from the collection 7io(X, M) := {H G 7~L{X) \ H \ M is nonadjacent to X}. The 
vertices in a set H G 7io(X, M) are separated from the vertices in the solution X by the 
edges between M and X, as shown in Figure 2.4(a). Figure 2.4(b) shows an example 
for a vertex set that is not in Ti, Q (X,M). The following observation makes clear why an 
additional data reduction rule for sets in Ho(X, M) is necessary. 

According to Corollary 2.2, if k is our parameter, we can employ Reduction Rule 2.1 to 
obtain a peripheral set M containing at most 3/c|X| vertices. By Lemma 2.8, exhaustively 
applying Reduction Rule 2.2 gives us a bound of \M\ on the number of sets in TCo(X, M). 
Since we have \M\ < 3k\X\, if we bound the size of each set in H,q(X, M) by a function 
linear in k, then the total number of vertices in sets in ?i Q (X, M) is 0(\X\k 2 ). To 
conclude an 0{\X\k) -vertex problem kernel, we have to provide a data reduction rule 
additionally to Reduction Rule 2.3. 

For each connected component in G — X that is induced by a set H G TCo(X, M), we 
now bound \H\ by a function linear in \H fl M\. Thus, we effectively bound the total 
number of vertices in sets in H,q(X, M) by 0(|M|). Observe that since X is a solution, 
every FlSG that contains a vertex from a set H G 7io{X, M) must also contain a vertex 
from X. Because H \ M is nonadjacent to X, the FlSG F must also contain a vertex 
from H fl M. The following data reduction rule is based on the idea that if \H\M\ is 
too large and contains vertices of FlSGs, then we can find a small solution containing 
the vertices in H fl M. 

Reduction Rule 2.4. Let X be a solution and let H G 7i(X). Given a vertex set M 
such that H \ M is nonadjacent to X, if \H \ M\ > \H n M\ + 1, then choose a vertex 
from H\M and remove it from G. 

To prove the correctness of this data reduction rule, we need a series of observations. To 
this end, we use the following definition: 

Definition 2.4. For two vertex sets U and W, we introduce the set E(U, W) of edges 
between U and W. That is, E(U, W) = {{u,w} | u G U and w G W are adjacent in G}. 
We say that a solution destroys an edge e, if the solution contains a vertex incident to e. 



24 




u 



w 



fH G H {X,M) 



e 



Figure 2.8: Empty squares are the vertices in the set M. Filled squares are in the 
solution S. Dashed edges are destroyed by S. Note that there are no edges from X to 
vertices in H \ M. Shown are FlSGs that result if a solution S does not destroy an 
edge e G E(H, X) and if S does not contain all but one vertex in H \ M. 

For a solution X and the vertex set H G of a connected component in G — X , 

the edges in E(H,X) separate the vertices in H from the vertices in X. This is shown 
in Figure 2.8. If a solution S destroys all edges in E(H,X), then G[H \ S] is an 
isolated 2-plex. 

Lemma 2.17. Let S and X be solutions. Assume that there is a vertex set M and a 
set H G 7~C{X) such that H\M is nonadjacent to X . If S does not destroy all edges 
in E(H, X), then it contains \H \ M\ — 1 vertices from H \ M . 

Proof. Because the solution S does not destroy all edges in E(H,X), there must exist 
an edge e G E(H \ S, X \ S). Now assume that S does not contain two distinct ver- 
tices u,w G H \ M, as shown in Figure 2.8. Because u,w ^ M and because H \ M is 
nonadjacent to X, the vertex v G X \S incident to the edge e cannot be adjacent to the 
vertices u,w G H\(S U M). But H\S contains at least three vertices: u, w and at least 
one vertex from H fl M. Thus, the vertex v is connected but nonadjacent to u and w. 
We can conclude from Theorem 1.1 that they are part of a FlSG. This contradicts S 
being a solution. □ 

Lemma 2.18. Let S and X be solutions. Assume that there is a vertex set M and a 
set H G H{X) such that H \ M is nonadjacent to X. If \H \ M\ > \H n M\ + I, then 
there exists a solution S' with \S'\ < \S\ that destroys all edges in E(H,X). 

Proof. Assume that S does not destroy all edges in E(H,X). From Lemma 2.17 and 
from \H\M\ > \H fl M\ + 1, we can conclude that there are at least \H fl M\ vertices 
from H \ M in S. The set S' := S U (H n M) \ (H \ M) destroys all edges in E(H, X). 
Because S contains at least \HC\M\ vertices from H\M and S' instead contains HdM, 
the set S' is not larger than S. The set S' is a solution, because G' := G— (SL)(H C\M)) is 
a 2-plex cluster graph and because G — S' is G' with the additional connected component 



Lemma 2.19. Reduction Rule 2.4 is correct. Given a vertex set M and a solution X, 
we can exhaustively apply Reduction Rule 2.4 in 0(n + m) time. 



formed by the 2-plex G[H \ M]. 



□ 
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Proof. Let u be the vertex chosen by Reduction Rule 2.4 and let G' :— G — {u}. We have 
to show that (G', k) is a yes-instance if and only if (G, k) is a yes-instance. If (G, k) is a 
yes-instance, then there exists a solution S with \S\ < k for G. Since G — 5* is a 2-plex 
cluster graph, G' — S is a 2-plex cluster graph as well. Thus, (G", k) is a yes-instance. 

If (C, k) is a yes-instance, then there exists a solution S with |5| < for G'. From 
Lemma 2.18, we can without loss of generality assume that the solution S destroys 
all edges in E(H,X). Now assume that G — S is not a 2-plex cluster graph. From 
Lemma 2.9, we can conclude that G contains a FlSG F including u. The FlSG F also 
contains a vertex v £ X \ S, because X is a solution. However, the vertices u and v are 
not connected in G — S, because S destroys all edges in E(H,X). Therefore, F cannot 
exist in G — S and S must be a solution for G. Because \S\ < k, it follows that (G, k) is 
a yes-instance. 

To prove the running time, recall that we can construct the set "H{X) in 0(n + m) 
time. Then, in 0(n + m) time, we construct a table T so that for every neighbor v of X, 
we have T[v] = 1. For each ff £ TC(X), we now count the number of vertices in if \ M 
and if fl M in 0(|if |) time. If in the counting process, we find a vertex v £ H \ M 
with T[f ] = 1, then if \ M is adjacent to X. This implies that Reduction Rule 2.4 is not 
applicable for if; we continue with the next set in 7i(X). The removal of vertices works 
in 0(n + m) time. □ 

Corollary 2.3. Let X be a solution. Assume that there is a vertex set M and a 
set H £ TC(X) such that H \ M is nonadjacent to X . After exhaustively applying 
Reduction Rule 2.4 given M , the set H\M contains at most \H D M\ + 1 vertices. 

2.4 Kernel Size 

In this section, we count the total number of vertices remaining in a graph G after all 
data reduction rules have been applied. To this end, we assume that we have a solution X 
and a set M that is peripheral with reference to X. Then, we count the vertices in X, 
the vertices in M and the vertices in the connected components in G — X that are not 
in M. 

Observe that to bound the sizes of the connected components in G — X, which are 
induced by sets in TC(X), we have presented two data reduction rules in Section 2.3. 
Reduction Rule 2.3 is applicable to all sets in TC(X). The additional Reduction Rule 2.4 
is only applicable to sets in the collection TCo(X, M) := {if £ T~t(X) \ if \ M is nonadja- 
cent to X}. Thus, we independently count the vertices in the sets in Ti, Q (X, M) and the 
vertices in the sets in H X (X, M) := {if £ Ti{X) \ if \ M is adjacent to X}. Figure 2.4(a) 
shows an example for a set in 7i (X, M), Figure 2.4(b) shows an example for a set in 
7ii(X, M). We have already made this distinction when we bounded the number of sets 
in T~C(X) in Section 2.2.2; it is not the only distinction we make: 

Definition 2.2(3) for peripheral sets ensures that if there is more than one set ff £ 
7ii(X,M) such that a vertex v £ X is adjacent to if \ M, then each such set if 
satisfies |if \ M\ < 2. To allow for a tighter worst-case analysis, we count the vertices in 
such sets independently. To this end, we use the following lemma: 
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Lemma 2.20. Let the set M be peripheral with respect to a solution X. For the sets 

Xi := {v e X \ there is exactly one set H G Ti-(X) such that H\M is adjacent to v}, 
X 2 ■— {v G X | there are two or no sets H G 7~1{X) such that H\M is adjacent to v} 

= X \ Xi (because M is peripheral and because of Definition 2.2(1)) and 
yt := {H G 7ii(X, M) \ H\M is adjacent to only vertices in Xi}, 

the following relations hold: 

^2\N(H\M)r\X\ = \Xi\ and \H\ < \X X \ and \H l {X,M)\H\ <2\X 2 \. 
Hen 

Proof. Let H G 71 be a set such that H \ M is only adjacent to vertices in X\. For 
a vertex v G X\ that is adjacent to H \ M, there is by definition of Xi no other 
set H' G 7ii(X, M) such that H' \ M is adjacent to v. Thus, if we count the number of 
vertices in N(H \ M) n X for all H G Ti,, then we count every vertex v G X\ exactly 
once. This proves the first relation. 

For each H G Ti C 7i 1 (X, M), there is by definition of 7ii(X,M) at least one 
vertex v G Xi such that H \ M is adjacent to Thus, \H\ < \Xi\. 

According to Definition 2.2(1), there are at most two sets H G T~C{X) such that H\M 
is adjacent to v. The set TCi(X, M) \ H only contains sets H G Hi(X, M) such that a 
vertex in X 2 is adjacent to H \ M. This yields \Hi{X, M) \ H\< 2\X 2 \. □ 

Given a solution X and a set M that is peripheral with respect to X, we now assume 
that all data reduction rules have been exhaustively applied to our input graph G and 
count the vertices in the connected components in G — X that are not in M. 

Lemma 2.21. Let X be a solution and let the set M be peripheral with respect to X . After 
exhaustively applying Reduction Rule 2.2, Reduction Rule 2.3 and Reduction Rule 2.4, it 
holds that 

\\J{H\M)\ < (k + 5)\X\ +2\M\. 

H£H(X) 

Proof. Let 7i, X\, and X 2 be as defined in Lemma 2.20. We can conclude from Lemma 2.16 
and Corollary 2.3 that | [j Hen ^ X ) H \ ls upper-bounded by 

{\ H n M \ + 2 \N(H \ M) n X\ + k + 3) + (\ H n M l + 1 )- 

H£Hi(X,M) H<eH (X,M) 

Because the sets in T~t(X) are pairwise disjoint, the two occurrences of \H (1 M\ sum up 
to a total of \M\, yielding 

{2\N(H \ M) n X\ + k + 3) + \M\ + \H (X, M)\. 

H£Hi{X,M) 
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By Lemma 2.8, we have that \Hq(X,M)\ < \M\. Thus, the above term is bounded by 

{2\N(H \ M) n X\ + k + 3) + 2\M\. 

H£Hi(X,M) 

For each set H 6 Hi(X, M) \ TC, the set H\M must be adjacent to a vertex from X 2 . 
This follows from the definition of 7i in Lemma 2.20 and by definition of 7ii(X,M). 
From Definition 2.2(3), we can conclude that \H \ M\ < 2, implying that only sets in 7i 
may actually contain 2\N(H \ M) n X\ + k + 3 vertices that are not in M. We obtain 

\[j(H\M)\ <J2 {2\N(H\M)f]X\ + k + 3) + 2\Hi(X, M) \ H\ + 2\M\. 
Hen(x) Hen 

Applying Lemma 2.20, we can bound this by 

2|X 1 | + |X 1 |(A; + 3)+4|X 2 | + 2|M| < (5 + k)\X x \ + 4|X 2 | + 2\M\ 

We can interpret this term as a function in |X X | and \X 2 \ with fixed \X\ and k > 0. 
Subject to the constraint \Xi \ + |X 2 | = \X\, it is maximal for \Xi \ = \X\ and | J^ 2 1 — 0. 
This yields the desired result. □ 

Theorem 2.1. 2-Plex Cluster Vertex Deletion has a problem kernel containing 
(10k + 6)|X| < 40k 2 + 24k vertices. It can be found in 0(kn 2 ) time. 

Proof. Given a 2-Plex Cluster Vertex Deletion instance (G,k), we first compute 
a constant-factor approximate solution X using Algorithm 2.1. Then, we compute a set 
that is peripheral with respect to X using Algorithm 2.2. We apply Reduction Rule 2.1, 
from which we obtain a new parameter k' < k and a peripheral set M with \M\ < 3/c|A| 
according to Corollary 2.2. Finally, we apply Reduction Rule 2.2, Reduction Rule 2.3, 
and Reduction Rule 2.4 to G. The so-obtained graph and the new parameter k' constitute 
our problem kernel. 

We first show that after applying all data reduction rules to G, the size of G only 
depends on the parameter k. To this end, we count the vertices in the solution X, the 
vertices in the peripheral set M and the vertices in G — X that are not in the peripheral 
set M. If (G,k) is a yes-instance, then Corollary 2.1 gives an upper bound of 4k on 
the number of vertices in the constant- factor approximate solution X . If A is larger, 
we terminate our kernelization algorithm and output that (G, k) is a no-instance. By 
applying Reduction Rule 2.1, we obtain a peripheral set M that contains at most 3fc|A| 
vertices according to Corollary 2.2. By exhaustively applying Reduction Rule 2.2, 
Reduction Rule 2.3, and Reduction Rule 2.4 to G, we can use Lemma 2.21 to give a 
bound of (k + 5)|A| + 2\M\ = (7k + 5i)\X\ on the number of vertices in G — X that 
are not in the peripheral set M. Adding \X\ and |M|, we conclude that G contains at 
most (10k + 6)\X\ = 40k 2 + 24A; vertices. 

The correctness of Reduction Rule 2.1, Reduction Rule 2.2, Reduction Rule 2.3, and 
Reduction Rule 2.4, has been shown in Lemma 2.2, Lemma 2.7, Lemma 2.12, and 
Lemma 2.19, respectively. 
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Finally, we show the running time of our kernelization algorithm. When we construct 
an approximate solution X using Algorithm 2.1, we can stop after finding more than k 
pairwise vertex- disjoint FlSGs, because this implies that (G, k) is a no-instance. Analog- 
ously to the proof of Lemma 2.1, it follows that we can construct X in 0(k(n + m)) time. 
Algorithm 2.2, Reduction Rule 2.1, Reduction Rule 2.2, Reduction Rule 2.3, and Reduction 
Rule 2.4 run in 0(kn 2 ) time according to Lemma 2.5, Lemma 2.6, Lemma 2.7, Lemma 2.15, 
and Lemma 2.19, respectively. □ 

To solve a 2-Plex Cluster Vertex Deletion instance, we can compute a problem 
kernel with 0(k 2 ) vertices and reduce this problem kernel to a 4-Hitting Set instance 
with 0((k 2 ) A ) sets, as discussed in Chapter 1. Then, we can solve this 4-Hitting Set 
instance by combining Wahlstrom's algorithm for 3-Hitting Set [25] with iterative 
compression, as discussed by Dom et al. [5]. 

Corollary 2.4. Using 4-Hitting Set, we can solve 2-Plex Cluster Vertex Dele- 
tion in O(3.076 fc + A; 8 + kn 2 ) time. 

Concluding Remarks. Peripheral sets played a central role in all stages of our 
kernelization algorithm. After constructing a peripheral set M with respect to a solution X 
using Algorithm 2.2, the peripheral set M helps us to bound the number of the connected 
components in G — X in Section 2.2.2. For a connected component in G — X, in 
Section 2.3.2 we use the peripheral set M to bound the number of vertices that are not 
in the redundant set constructed by Algorithm 2.3. Then, we remove vertices from that 
redundant set to bound the overall size of the connected component. In Section 2.3.3, 
we use the set of edges between M and I as a separator to develop an additional data 
reduction rule to further reduce the sizes of the connected components in G — X. 

To construct a set M that is peripheral with respect to a solution X, we employ 
Algorithm 2.2. We could also construct M by enumerating all minimal FlSGs in G, which 
are shown in Figure 1.1. Then, for each vertex v G X, we could pick an inclusion- maximal 
set of FlSGs that pairwisely intersect only in v. However, because each minimal FlSG 
contains four vertices, the total number of minimal FlSGs in a graph with n vertices 
is 0(n A ). In contrast, Algorithm 2.2 finds at most 0(n) FlSGs for each vertex v G X. It 
runs in 0(kn 2 ) time. Therefore, the running time of enumerating all minimal FlSGs in a 
graph might be significantly worse that of Algorithm 2.2. 
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3 Kernelization for s-Plex Cluster 
Vertex Deletion 



In this chapter, we generalize the problem kernel for 2-Plex Cluster Vertex Dele- 
tion to s-Plex Cluster Vertex Deletion. We will see that many definitions and 
lemmas that we have worked out for the case s = 2 also work for general s if we modify 
them slightly. In Section 3.1, we first show how to find an approximate solution X for a 
graph G, so that G — X is an s-plex cluster graph. Then, we generalize our concept of a 
peripheral set and show how to find one. In Section 3.2, we revise our data reduction 
rules to bound the number and the sizes of the connected components in G — X. In 
Section 3.3, we conclude a problem kernel with 0(k 2 s 3 ) vertices for s-Plex Cluster 
Vertex Deletion. 

We now turn our attention to the main difference between 2-Plex Cluster Vertex 
Deletion and s-Plex Cluster Vertex Deletion. For 2-Plex Cluster Vertex 
Deletion, we used the fact that a 2-plex containing at least three vertices is connected. 
We used this fact to construct a peripheral set using Algorithm 2.2, in the correctness 
proof of Reduction Rule 2.3, and in the correctness proof of Reduction Rule 2.4. To 
generalize these proofs, we need the following result: 

Lemma 3.1. An s-plex containing at least 2s — 1 vertices is a connected graph. 

Proof. Let G = (V, E) be an s-plex with more than one connected component. Because 
G is an s-plex, a vertex in G is nonadjacent to at most s — 1 other vertices in G. 

Let W C V be the vertex set of a connected component of G. Because a vertex in W 
is nonadjacent to all vertices in V \ W, we have that \V \ W\ < s — 1 and \W\ < s — 1. 
Therefore, it holds that |V| < 2s — 2. Thus, if an s-plex contains at least 2s — 1 vertices, 
it must be a connected graph. □ 

Note that the bound given in Lemma 3.1 is tight. Consider two cliques with s — 1 vertices 
each. These two cliques can still be considered as one single s-plex with 2s — 2 vertices. 

3.1 Approximate Solutions and Peripheral Sets 

Given an s-Plex Cluster Vertex Deletion instance (G, k), in this section we first 
show how to find an approximate solution X for G. We then generalize our concept of 
peripheral sets and construct a set that is peripheral with respect to the solution X. 

Similarly to the case s = 2, we can easily find a constant-factor approximate solution 
for s-Plex Cluster Vertex Deletion using the algorithm by Guo et al. [11], which 
finds an 0(s + i/s)-vertex FlSG in 0(s(n + m)) time if we apply it to a graph that 
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is not a s-plex cluster graph. In particular, if T s is the maximum integer satisfying 
T s ■ (T s + 1) < s, then Guo et al. [11] show that their algorithm finds a FlSG with at 
most s+l+Tg vertices. Similarly to Lemma 2.1, we can show that Algorithm 2.1 computes 
a constant-factor approximate solution for s-Plex Cluster Vertex Deletion. 

Lemma 3.2. There is a factor-(s + 1 + T s ) approximate solution for s-Plex Cluster 
Vertex Deletion and it can be found in 0(ns(n + m)) time. 

Corollary 3.1. Let (G,k) be a yes-instance and let X be a factor-(s + l + T s ) approximate 
solution for G. Then, X contains O(sk) vertices. 

We now construct a set that is peripheral with respect to a solution X. To this end, we 
modify Definition 2.2. 

Definition 3.1. Let X be a solution. We call a vertex set M with the following properties 
peripheral (with respect to X): 

1. For each vertex v G X, there are at most s sets H G 7~L(X) such that H \ M is 
adjacent to v. 

2. If there is a vertex «6l and a set H G T~C{X) such that H \ M is adjacent to v, 
then v is nonadjacent to at most 2s — 3 vertices in H\M . 

3. For each vertex v G X, if there is more than one set H G 'H{X) such that H\M 
is adjacent to v, then each such set H satisfies \H \ M\ < 2s — 2. 

To construct a peripheral set, we proceed analogously to Section 2.2: for each vertex v in 
a given solution X, we find a FlSG F including v that contains no vertices from M(v ). 
Then, we add the vertices of F — {v} to M(v). We find such FlSGs by three observations, 
each leading to one of three phases of an algorithm that constructs the sets M(v ). 

We now turn to our first observation. Given a solution X, assume that there exists 
a vertex «6l and a set U C N(v) \ X of s + 1 neighbors of v such that U contains a 
vertex u that is nonadjacent to U \ {u}. Then, the vertex u is connected to every vertex 
in U, because the vertices in U are neighbors of v. The vertex u is nonadjacent to the s 
vertices in U \ {u}. By Theorem 1.1, the graph U U] is a FlSG. 

Algorithm 3.1 (Phase 1). Given a graph G and a solution X, for each vertex v G X , 
let M{v ) = 0. For each v G X, as long as there is a set {7 C N(v ) \(IU M(t>)) such that 

1. |[/| = s + 1 and 

2. there exists a vertex u £ U that is nonadjacent to U \ {u}, 
add the vertices in U to M(v). 

Now, for each vertex v in the solution X, let M(v) be the set constructed by Phase 1 of 
Algorithm 3.1. For a vertex v G X, assume that there exists a set H G 7~C{X) such that t> 
is adjacent to a vertex u £ H \ M(v). Further, assume that the vertex v is nonadjacent 
to a set W C if \ M(u) of 2s — 2 vertices. Then, the graph U W] is an induced 

subgraph of the s-plex G[H] and contains 2s — 1 vertices. According to Lemma 3.1, it 
is connected. The vertex v is, because it is a neighbor of u and because u is adjacent 
to W, connected but nonadjacent to the 2s — 2 vertices in W. By Theorem 1.1, the 
graph Cr[{tt, v} U W] is a FlSG. We continue Algorithm 3.1 as follows: 
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Algorithm 3.1 (Phase 2). For each v G X, as long as there is a set H G TC(X) such that 

1. the vertex v is adjacent to a vertex u E H \ M(v) and 

2. the vertex v is nonadjacent to a set PF C if \ M(v) of vertices with \W\ = 2s — 2, 
add the vertex it and the vertices in W to M(v). 

Now, for each vertex v in the solution X, let M(v) be the set constructed by Phase 1 and 
Phase 2 of Algorithm 3.1. Assume that for a vertex v & X, there are two sets U,W G T~C(X) 
such that v is adjacent to the vertices u £ U\ M(v) and 10 G IV \ M(t> ). Further, assume 
that W\M(v) contains at least 2s — 1 vertices. Then, G[W\M(v )] is a 2-plex containing 
at least 2s — 1 vertices. According to Lemma 3.1, it is connected. The vertex u G U\M{v) 
is nonadjacent to at least 2s — 1 vertices in W\M(v), but F := t> }U W] is connected. 

According to Theorem 1.1, it is a FlSG. 

Algorithm 3.1 (Phase 3). For each v G X , as long as there are U,W G 7~C{X) such that 

1. |W\M(«)| >2s-l and 

2. the vertex t> has neighbors u E U \ M(v) and w G W \ M(v), 

add the vertices u, w, and 2s — 2 other vertices from W \ M(v) to M(v). 

Note that in contrast to Algorithm 2.2, Phase 2 and Phase 3 of Algorithm 3.1 do not 
necessarily find minimal FlSGs. That is, there exist FlSGs found by Phase 2 and Phase 3 
such that we could remove a vertex from them and they would still be FlSGs. For 
running time considerations, we construct FlSGs from parts of s-plexes that contain 
enough vertices to derive their connectedness from Lemma 3.1. Thus, we do not have to 
explicitly check whether the subgraphs that we find are connected. 

Lemma 3.3. Let X be a solution. Let M := [J v&x M(v) be the set constructed by 
Algorithm 3.1. The set M is peripheral with respect to X. 

Proof. The proof of this lemma is analogous to the proof of Lemma 2.3. For each 
vertex v G X, the set M(v) satisfies all properties in Definition 3.1. This follows directly 
from the description of Algorithm 2.2. □ 

Lemma 3.4. Given a solution X, Algorithm 3.1 can be carried out in 0{\X\n 2 ) time. 

Proof. The running times of Phase 1 and Phase 2 of Algorithm 3.1 can be proven in the 
same way as for Algorithm 2.2 in Lemma 2.5. We only prove the running time of the 
modified Phase 3. First, we construct for each vertex dg! the set N(v) \ (M(v) U X). 
The proof of Lemma 2.5 shows how this can be done in 0(n) time. For each ver- 
tex u G N(v) \ (M(v) U X), we can determine the set H G 7i{X) with u G H in constant 
time, as seen in the proof of Lemma 2.5. Counting the elements in H \ M(v) takes at 
most 0(n) time. This yields a running time of 0(|X|n 2 ) for Phase 3 of Algorithm 3.1. □ 

3.2 Adapted Data Reduction Rules and Bounds 

Given an s-Plex Cluster Vertex Deletion instance (G, k) and a solution X for G, 
we now bound the number and the sizes of the connected components in G — X. To this 
end, we first revise Reduction Rule 2.1 as shown below. 
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Reduction Rule 3.1. Let X be a solution. For each vertex v £ X, let M(v) be the set 
constructed by Algorithm 3.1. If there exists a vertex v £ X such that |M(t>)| > 2sk, 
then delete t> from G and X and decrement k by one. 

Lemma 3.5. Reduction Rule 3.1 is correct. Given a solution X and the set M(v) 
constructed by Algorithm 3.1 for each vertex v £ X , we can exhaustively apply Reduction 
Rule 3.1 in 0(\X\n + m) time. 

Proof. For each vertex v in a solution X, Algorithm 3.1 adds at most 2s vertices to M{v) 
for each found FlSG. If a vertex v £ X satisfies |M(t>)| > 2sk, then more than k FlSGs 
pairwisely intersect only in v. According to Lemma 2.2, we can delete v from G and 
decrement k by one. The running time can be shown analogously to Lemma 2.6. □ 

Corollary 3.2. Let X be a solution for G. For each v £ X , let M(v) be the set 
constructed by Algorithm 3.1. After exhaustively applying Reduction Rule 2.1 to G 
and X , the peripheral set M := |Jt>ex M(v) contains at most 2sk\X\ vertices. 

Given a graph G and a solution X, we can apply Reduction Rule 2.2 without any 
changes compared to the case s — 2. As we have seen in Section 2.2.2, Lemma 2.8 and 
Definition 2.2 then bound the number of connected components in G — X. It is left to 
bound their sizes. To this end, we only need to slightly change Reduction Rule 2.3 and 
Reduction Rule 2.4. Recall that the connected components in G — X are induced by sets 
in the collection 7i(X). We start with a revision of Reduction Rule 2.3: 

Reduction Rule 3.2. Let A be a solution, let H £ Ti(X) and let R{H) be a redundant 
subset of H as defined in Definition 2.3. If \R(H)\ > k + 2s — 1, choose an arbitrary 
vertex from R{H) and remove it from G. 

For the correctness proof of Reduction Rule 3.2, observe that Lemma 2.10 and Lemma 2.11 
are still valid if we prove them under the following assumption instead of proving them 
under Assumption 2.1: 

Assumption 3.1. Let A be a solution and let R{H) be a redundant subset of if £ 7i(X). 
Assume that Reduction Rule 3.2 chooses to remove u £ R{H) from G. Further, assume 
that there exists a solution S with |,S| < k for the graph G — {u}. 

Assumption 3.1 implies that \R{H) \ (S U {u})\ > 2s — 1; otherwise, Reduction Rule 3.2 
could not have been applied. Because G[H] is a 2-plex, G[R(H) \ (SU {«})] is connected. 
The graph G[H \ (S U {«})] is connected for the same reason. In the following, we write 
G' for G - {u}. 

Lemma 3.6. Reduction Rule 3.2 is correct. 

Proof. We have to show that (G, k) is a yes-instance if and only if (G', k) is a yes-instance. 
If (G, k) is a yes-instance, then there exists a solution S with |,S| < k such that G — S is 
an s-plex cluster graph. Clearly, then also G' — S is an s-plex cluster graph and (G', k) is 
a yes-instance. 
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If (G',k) is a yes-instance, then there exists a solution S with \S\ < k such that 
G' — S is an s-plex cluster graph, implying that Assumption 3.1 is true. Assume that 
G — S contains a FlSG. By Lemma 2.9, there exists a FlSG F in G — S containing the 
vertex u. Because F is a FlSG, it contains a vertex v that is connected but nonadjacent 
to a set W of s other vertices in F. 

If m ^ {f } U W, then Lemma 2.10 shows that the vertices in {v} U W are connected to 
all vertices in H \ (S U {«}). Thus, the vertices in {u} U W would exist in G' — S and 
would be connected. That contradicts G' — S being an s-plex cluster graph, because v is 
nonadjacent but connected to the s vertices in W. Thus, we have u G {v} U W. 

First, assume that u — v. That is, the vertex u G R{H) is nonadjacent to W . From 
Lemma 2.11, we can conclude that W C H \ S. Because also u G H \ S , this contradicts 
the graph G[H \ S] being an s-plex. Thus, we have u G W. 

Because u G W, we have that u is nonadjacent to v. From Lemma 2.11, we conclude that 
v G H\S. By Definition 2.3, there exists an X-module Z(H) with R(H) C C iJ 

and f G Z(H), because the vertex v G H \S is nonadjacent to the vertex u G R(H). But 
then, because the vertex f G ^(-ff) is nonadjacent to W, the vertices in W must also be 
in H \ S by Lemma 2.11. Because also v is in H \ S, this again contradicts G[H \ S] 
being an s-plex. We conclude that G — S must be a s-plex cluster graph. Thus, (G, k) is 
a yes-instance. □ 

We employ Algorithm 2.3 to construct redundant sets. For a solution X, the bound on 
the number of vertices in a connected component in G — X then changes as follows: 

Lemma 3.7. Let the set M be peripheral with respect to a solution X . For a vertex 
set H G 7i(X), let R{H) be the redundant subset constructed by Algorithm 2.3. After 
exhaustively applying Reduction Rule 3.2 using R{H), the number of vertices in H\M 
isO(s\HnM\ + s 2 \N(H\M)nX\ + k). 

Proof. To prove the above lemma, we study the sets constructed in Algorithm 2.3. By 
construction of R(H), we have R{H) — H \ R(H). Observe that because R{H) C H, 
we also have H \ R{H) = R(H). Because G[H] is an s-plex, there exist at most s — 1 
vertices u G H for every vertex w G H H M such that u and w are nonadjacent. Thus, we 
have |A(iif)| G 0(s\H(~)M\). Because M is peripheral, we can conclude from Definition 3.1 
that for each vertex w G N(H \ M) H X, there are at most 2s — 3 vertices u G H such 
that u and w are nonadjacent. Thus, we have \B(H)\ G 0(s\N(H\M)f]X\). Now, again 
because G[H] is an s-plex, there exist at most s — 1 vertices u G H for every vertex w G 
B(H) such that u and w are nonadjacent. Thus, we have \C(H)\ G 0(s\B(H)\) C 
0{s 2 \N(H \ M) fl X\). This shows that the number of vertices in H \ (R(H) U M) is 
0{s\H n M| + s 2 \N(H \ M) fl After applying Reduction Rule 2.3, the number of 
vertices in R{H) is O(k). □ 

Let M be peripheral with respect to a solution X. We now revise Reduction Rule 2.4 to 
reduce the sizes of the connected components in G — X that are induced by sets H G T~L(X) 
such that H \ M is nonadjacent to X. Refer to Figure 2.4(a) for an example. 
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Reduction Rule 3.3. Let X be a solution and let H G 7i(X). Given a vertex set M 
such that H \ M is nonadjacent to X, if |if \ M\ > \H D M| + 2s — 3, then choose a 
vertex from H\M and remove it from G. 

Lemma 3.8. Reduction Rule 3.3 is correct. Given a vertex set M, we can exhaustively 
apply Reduction Rule 3.3 in 0(n + m) time. 

Proof. Let S be a solution. First, observe that analogous to the proof of Lemma 2.17, we 
can show that if S does not destroy all edges between vertices in H and X, then it must 
contain at least \H\M\- (2s - 3) vertices from H \ M. If \H\M\ > \HnM\ + 2s- 3, 
then we can analogously to the proof of Lemma 2.18 find a solution 5" with \S'\ < \S\ 
that destroys all edges between vertices in H and X . From this, Lemma 3.8 follows 
analogously to Lemma 2.19. □ 

Corollary 3.3. Let X be a solution. Assume that there is a vertex set M and a 
set H G TC(X) such that H \ M is nonadjacent to X . After exhaustively applying 
Reduction Rule 3.3 given M , the number of vertices in H\M is 0(s + \H D M\). 

3.3 Kernel Size 

Given an s-Plex Cluster Vertex Deletion instance (G, k), we now give a bound 
on the number of vertices in G after all data reduction rules have been applied. Given 
a solution X, recall that for a connected component in G — X that is induced by a 
set H e T-Co(X, M), the set H \ M is nonadjacent to X (cf. Figure 2.4(a)); for a vertex 
set H e Hi(X,M), the set H \ M is adjacent to X (cf. Figure 2.4(b)). We handle 
connected components induced by vertex sets in T~ii(X, M) and Ho(X, M) separately. 

Lemma 3.9. Let the set M be peripheral with respect to a solution X. For the sets 

X\ :— {v G X | there is exactly one set H e 7~L(X) such that H\M is adjacent to v}, 
X 2 :=X\ X l and 

7i '■= {H G 7ii(X, M) \ H\M is adjacent to only vertices in X{\, 
the following relations hold: 

J2W(H\M)nX\ = \X 1 \ and \H\ < \X X \ and \Hi(X, M)\H\< s\X 2 \. 
Hen 

Proof. This follows analogously to the proof of Lemma 2.20 with Definition 3.1 for 
peripheral sets. □ 

Theorem 3.1. s-Plex Cluster Vertex Deletion has a problem kernel with 0(k 2 s 3 ) 
vertices. It can be found in 0(ksn 2 ) time. 

Proof. Given an s-Plex Cluster Vertex Deletion instance (G,k), we first find 
a constant-factor approximate solution X for G using Algorithm 2.1. If (G,k) is a 
yes-instance, we have |X| G O(sk) according to Corollary 3.1. In this case, we find \X\ 
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in 0(ks(n + m)) time according to Lemma 3.2, because if we find more than k FlSGs 
using Algorithm 2.1, then we can stop and output that (G, k) is a no-instance. After 
constructing the constant-factor approximate solution X, we construct a set M that is 
peripheral with respect to X. According to Lemma 3.4, this can be done in 0(\X\n 2 ) 
time using Algorithm 3.1. According to Corollary 3.2, we can use Reduction Rule 3.1 to 
reduce the size of M to at most 2sfc|X| vertices. This can be done in 0(|X|n + m) time 
according to Lemma 3.5. We then apply Reduction Rule 2.2 in 0(n + m) time as shown 
in Lemma 2.7, followed by Reduction Rule 3.2. Analogously to the proof of Lemma 2.15, 
we can show that this works in 0(n 2 ) time. Finally, we apply Reduction Rule 3.3, which 
runs in 0(n + m) time; this follows analogously to the proof of Lemma 2.19. 

We now count the vertices that remain in G. The graph G contains vertices from X, 
vertices from M, and vertices from the connected components in G — X that are not 
in M. As shown above, we have |X| G O(sk) and \M\ G 0(s 2 k 2 ). It is left to count the 
vertices in U/rewpO (-^ \ ^0- Let Ti, X 1 , and X 2 be as defined in Lemma 3.9. We can 
conclude from Lemma 3.7 and Corollary 3.3 that the size of Ui?ew(x)(-^ \ ^0 ^ s 

0(J2(s\H n M\ + s 2 \N(H \M)f]X\ + k) + J2(\H n M\ + s)). 

H£Hi(X,M) HeH (X,M) 

Because the sets in TC(X) are pairwise disjoint, we have that J2h&h(x) s \H H M\ < s\M\. 
Thus, the above term is 

0(^(s 2 |N(#\M) n X\ + k) + s\M\ + s|H (A, M)|). 

HeHi{X,M) 

By Lemma 2.8, we have that \H (X,M)\ < \M\. Thus, this is 
0(^(s 2 \N(H\M) nX| + k) +s\M\). 

HeHi(X,M) 

For each set H e 7ii(X, M) \ Ti, the set H\M must be adjacent to a vertex from X%. 
This is by definition of Ti, in Lemma 3.9 and by definition of TC\(X, M). From Defini- 
tion 3.1, we can conclude that \H \ M\ G 0(s), implying that only the sets in Ti may 
contain Q(s 2 \N(H \ M) C\X\ + k) vertices that are not in M. Thus, we have 

||J(#\M)| G 0(J2 {s 2 \N(H\M) nX\ + k) +s\Ti 1 (X,M)\Ti\ + s\M\). 
HeH(x) Hen 

By Lemma 3.9, this is 0(|Xi|s 2 + k\X x \ + \X 2 \s 2 + s\M\). Using |Ai| + \X 2 \ = \X\ and 
adding the vertices in M and A, this is 0((s 2 + &)|X| + s|M|). Thus, the total number 
of vertices in G is 0(k 2 s 3 ). □ 
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4 Conclusion and Outlook 



We have shown an 0(£; 2 s 3 )-vertex problem kernel for s-Plex Cluster Vertex Dele- 
tion. This result is comparable with the 0(&;s 2 )-vertex problem kernel for s-Plex 
Editing shown by Guo et al. [11]: in an n- vertex graph, one vertex deletion can lead 
to n — 1 edge deletions. Under the assumption that input graphs for clustering problems 
are typically dense, this suggests that typical parameter values for s-Plex Editing 
are at least quadratic in parameter values for s-Plex Cluster Vertex Deletion; 
the parameter is the number of allowed graph modifications. Seen from this angle, our 
result seems consistent with the result that s-Plex Editing has a problem kernel 
with 0{ks 2 ) vertices. 

It is open whether s-Plex Cluster Vertex Deletion has an 0(fcs c )-vertex problem 
kernel for some constant c. It is also open to improve the s 3 -factor in the number of 
vertices in our problem kernel. This factor results from the size of the constant-factor 
approximate solution shown in Corollary 3.1, from the size of the peripheral set shown in 
Corollary 3.2, and from the way we construct redundant sets in Lemma 3.7. The most 
promising approach to improve on the s 3 -factor seems to be the construction of larger 
redundant sets so that more vertices can be removed by Reduction Rule 3.2. 

In Chapter 1, we discussed how to solve s-Plex Cluster Vertex Deletion using 
d- Hitting Set for a natural number d e 0(s + y/s). For d- Hitting Set, problem 
kernels containing 0(k d ~ r ) or 0(k d ) elements are known [1, 8, 15]. This bound is 
exponential in d and d is in turn bounded by a function linear in s. This yields an upper 
bound on the number of elements in a g?-Hitting Set kernel that is exponential in s. 
From this angle, it is remarkable that problem kernels for s-Plex Cluster Vertex 
Deletion as well as for s-Plex Editing exist whose number of vertices is bounded by 
a polynomial in s as well as in k. 

It might be hard to find a search tree for s-Plex Cluster Vertex Deletion that 
is smaller than the search tree for an equivalent g?-Hitting Set instance. However, 
a d-HiTTiNG Set instance obtained from our s-Plex Cluster Vertex Deletion 
problem kernel contains 0(k 2d ) sets. This bound is exponential in d. Thus, constructing 
a ci-HiTTiNG Set instance from an s-Plex Cluster Vertex Deletion instance 
might be practically infeasible. It is open to find faster algorithms for s-Plex Cluster 
Vertex Deletion that do not rely on g?-Hitting Set. 

The most promising approach to faster algorithms for s-Plex Cluster Vertex 
Deletion seems to be iterative compression [12] introduced by Reed et al. [22]. Hiiffner 
et al. [14] have successfully applied it to Cluster Vertex Deletion. Using iterative 
compression, we can solve s-Plex Cluster Vertex Deletion by solving multiple 
instances of the following problem: 
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Disjoint s-Plex Cluster Vertex Deletion 

Instance: A graph G = (V, E), a non-negative number k, and a solution S C V 
with \S\ < k + 1 such that G — S is an s-plex cluster graph. 

Question: Is there an alternative solution S' with S D S" = and |S"| < such 
that C7 — S" is an s-plex cluster graph? 

Fellows et al. [7] have shown that while the analog problem for Cluster Vertex 
Deletion is in P, Disjoint s-Plex Cluster Vertex Deletion is NP-hard. After 
some initial observations on the case s = 2, we guess that Disjoint 2-Plex Cluster 
Vertex Deletion can be solved using a size-0(2 fc ) search tree. In combination with 
our kernelization algorithm, we could then solve 2-Plex Cluster Vertex Deletion 
in 0(3 k k c + kn 2 ) time for some constant c. 



38 



Bibliography 



[1] Faisal N. Abu-Khzam. Kernelization algorithms for <i-hitting set problems. In 

Proceedings of the 10th International Workshop on Algorithms and Data Structures 
(WADS '07), volume 4619 of Lecture Notes in Computer Science, pages 434-445. 
Springer, 2007. 

[2] Balabhaskar Balasundaram, Sergiy Butenko, and Illya V. Hicks. Clique relaxations 
in social network analysis: The maximum A;-plex problem. Operations Research, 
2009. To appear. 

[3] Pavel Berkhin. A survey of clustering data mining techniques. In Grouping Multidi- 
mensional Data, pages 25-71. Springer, 2006. 

[4] Jonathan F. Buss and Judy Goldsmith. Nondeterminism within P. SIAM Journal 
on Computing, 22(3):560-572, 1993. 

[5] Michael Dom, Jiong Guo, Falk Hiiffner, Rolf Niedermeier, and Anke Truss. Fixed- 
parameter tractability results for feedback set problems in tournaments. Journal of 
Discrete Algorithms, 2009. 

[6] Rodney G. Downey and Michael R. Fellows. Parameterized Complexity. Springer, 
1999. 

[7] Michael R. Fellows, Jiong Guo, Hannes Moser, and Rolf Niedermeier. A complexity 
dichotomy for finding disjoint solutions of vertex deletion problems. In Proceedings 
of Mathematical Foundations of Computer Science (MFCS '09), volume 5734 of 
Lecture Notes in Computer Science, pages 319-330. Springer, 2009. 

[8] Jorg Flum and Martin Grohe. Parameterized Complexity Theory. Springer, 2006. 

[9] Tibor Gallai. Transitiv orientierbare Graphen. Acta Mathematica Hungarica, 18(1- 
2):25-66, 1967. 

[10] Jiong Guo. A more effective linear kernelization for cluster editing. Theoretical 
Computer Science, 410(8-10) :718-726, 2009. 

[11] Jiong Guo, Christian Komusiewicz, Rolf Niedermeier, and Johannes Uhlmann. A 
more relaxed model for graph-based data clustering: s-Plex Editing. In Proceedings 
of the 5th International Conference on Algorithmic Aspects in Information and 
Management (AAIM '09), volume 5564 of Lecture Notes in Computer Science, pages 
226-239. Springer, 2009. 



39 



[12] Jiong Guo, Hannes Moser, and Rolf Niedermeier. Iterative compression for exactly 
solving NP-hard minimization problems. In Algorithmics of Large and Complex 
Networks, volume 5515, pages 65-80. Springer, 2009. 

[13] Jiong Guo and Rolf Niedermeier. Invitation to data reduction and problem kernel- 
ization. SIGACT News, 38(l):31-45, 2007. 

[14] Falk Huffner, Christian Komusiewicz, Hannes Moser, and Rolf Niedermeier. Fixed- 
parameter algorithms for cluster vertex deletion. Theory of Computing Systems, 
2009. Available electronically. 

[15] Stefan Kratsch. Polynomial kernelizations for MIN F + Ili and MAX NP. In Pro- 
ceedings of the 26th International Symposium on Theoretical Aspects of Computer 
Science (STACS '09), volume 09001 of Dagstuhl Seminar Proceedings, pages 601-612. 
Internationales Begegnungs- und Forschungszentrum fur Informatik (IBFI), Schloss 
Dagstuhl, Germany, 2009. 

[16] John M. Lewis and Mihalis Yannakakis. The node-deletion problem for hereditary 
properties is NP-complete. Journal of Computer and System Sciences, 20(2):219-230, 
1980. 

[17] Carsten Lund and Mihalis Yannakakis. The approximation of maximum subgraph 
problems. In Proceedings of the 20th International Colloquium on Automata, Lan- 
guages and Programming (ICALP '93), pages 40-51, London, UK, 1993. Springer. 

[18] Benjamin McClosky and Illya V. Hicks. Combinatorial algorithms for the maximum 
A;-plex problem. Manuscript, January 2009. 

[19] Ross M. McConnell and Jeremy Spinrad. Modular decomposition and transitive 
orientation. Discrete Mathematics, 201 (1-3): 189-241, 1999. 

[20] Hannes Moser, Rolf Niedermeier, and Manuel Sorge. Algorithms and experiments for 
clique relaxations — finding maximum s-plexes. In Proceedings of the 8th International 
Symposium on Experimental Algorithms (SEA '09), volume 5526 of Lecture Notes 
in Computer Science, pages 233-244. Springer, 2009. 

[21] Rolf Niedermeier. Invitation to Fixed- Parameter Algorithms. Oxford University 
Press, 2006. 

[22] Bruce Reed, Kaleigh Smith, and Adrian Vetta. Finding odd cycle transversals. 
Operations Research Letters, 32(4):299-301, 2004. 

[23] Satu Elisa Schaeffer. Graph clustering. Computer Science Review, l(l):27-64, 2007. 

[24] Stephen B. Seidman and Brian L. Foster. A graph-theoretic generalization of the 
clique concept. Journal of Mathematical Sociology, 6:139-154, 1978. 



40 



[25] Magnus Wahlstrom. Algorithms, Measures and Upper Bounds for Satisfiability and 
Related Problems. PhD thesis, Department of Computer and Information Science, 
Linkopings universitet, Sweden, 2007. 

[26] Bin Wu and Xin Pei. A parallel algorithm for enumerating all the maximal /c-plexes. 
In Emerging Technologies in Knowledge Discovery and Data Mining, volume 4819 
of Lecture Notes in Artificial Intelligence, pages 476-483. Springer, 2007. 



41 



